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Preface 


C# 8.0 represents the seventh major update to Microsoft's flagship programming 
language, positioning C# as a language with unusual flexibility and breadth. At one 
end, it offers high-level abstractions such as query expressions and asynchronous 
continuations, whereas at the other end, it allows low-level efficiency through con- 
structs such as custom value types and optional pointers. 


The price of this growth is that there’s more than ever to learn. Although tools such 
as Microsoft’s IntelliSense—and online references—are excellent in helping you on 
the job, they presume an existing map of conceptual knowledge. This book provides 
exactly that map of knowledge in a concise and unified style—free of clutter and 
long introductions. 


Like the past five editions, C# 8.0 in a Nutshell is organized around concepts and use 
cases, making it friendly both to sequential reading and to random browsing. It also 
plumbs significant depths while assuming only basic background knowledge, mak- 
ing it accessible to intermediate as well as advanced readers. 


This book covers C#, the Common Language Runtime (CLR), and the essen- 
tial .NET Core assemblies. We've chosen this focus to allow space for difficult topics 
such as concurrency, security, and access to operating system functionality— 
without compromising depth or readability. Features new to C# 8 are flagged so that 
you can also use this book as a reference for C# 7. 


Intended Audience 


This book targets intermediate to advanced audiences. No prior knowledge of C# is 
required, but some general programming experience is necessary. For the beginner, 
this book complements, rather than replaces, a tutorial-style introduction to 
programming. 


This book is an ideal companion to any of the vast array of books that focus on an 
applied technology such as ASP.NET Core, Windows Presentation Foundation 


xi 


(WPF), and Universal Windows Platform (UWP). The areas of the language 
and .NET Core that such books omit, C# 8.0 in a Nutshell covers in detail, and vice 
versa. 


If you're looking for a book that skims every .NET technology, this is not for you. 
This book is also unsuitable if you want to learn about APIs specific to mobile 
device development. 


How This Book Is Organized 


Chapters 2 through 4 concentrate purely on C#, starting with the basics of syntax, 
types, and variables, and finishing with advanced topics such as unsafe code and 
preprocessor directives. If you're new to the language, you should read these chap- 
ters sequentially. 


The remaining chapters cover essential elements of .NET Core, including such top- 
ics as Language-Integrated Query (LINQ), XML, collections, concurrency, I/O and 
networking, memory management, reflection, dynamic programming, attributes, 
security, and native interoperability. You can read most of these chapters randomly, 
except for Chapters 5 and 6, which lay a foundation for subsequent topics. You're 
also best off reading the three chapters on LINQ in sequence, and some chapters 
assume some knowledge of concurrency, which we cover in Chapter 14. 


What You Need to Use This Book 


The examples in this book require .NET Core 3. You will also find Microsoft's .NET 
documentation useful to look up individual types and members (which is available 
online). 


Although it’s possible to write source code in Notepad and build your program from 
the command line, you'll be much more productive with a code scratchpad for 
instantly testing code snippets, plus an Integrated Development Environment (IDE) 
for producing executables and libraries. 


For a Windows code scratchpad, download LINQPad 6 from www.lingpad.net 
(free). LINQPad fully supports C# 8.0 and is maintained by one of the authors. 


For a Windows IDE, download Visual Studio 2019: any edition is suitable for what's 
taught in this book. For a cross-platform IDE, download Visual Studio Code. 


All code listings for all chapters are available as interactive 
(editable) LINQPad samples. You can download the entire lot 
in a single click: at the bottom left, click the LINQPad’s Sam- 
ples tab, click “Download more samples,’ and then choose “C# 
8.0 in a Nutshell.” 


-NET Core is available for Windows, Linux, and macOS. Cer- 
tain cross-platform features were tested on Ubuntu Linux 
18.04. That code is available on GitHub. 
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Conventions Used in This Book 


The book uses basic UML notation to illustrate relationships between types, as 
shown in Figure P-1. A slanted rectangle means an abstract class; a circle means an 
interface. A line with a hollow triangle denotes inheritance, with the triangle point- 
ing to the base type. A line with an arrow denotes a one-way association; a line 
without an arrow denotes a two-way association. 





Interface 






Abstract class 






(Unidirectional 
association) 





Referencing type 
Property 









Referenced type 


Referencing type (Bidirectional Referencing type 
Property Property 


association) 











Figure P-1. Sample diagram 


The following typographical conventions are used in this book: 


Italic 
Indicates new terms, URIs, filenames, and directories 


Constant width 
Indicates C# code, keywords and identifiers, and program output 


Constant width bold 
Shows a highlighted section of code 


Constant width italic 
Shows text that should be replaced with user-supplied values 


This element signifies a general note. 
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Using Code Examples 


This book is here to help you get your job done. In general, you may use the code in 
this book in your programs and documentation. You do not need to contact us for 
permission unless youre reproducing a significant portion of the code. For example, 
writing a program that uses several chunks of code from this book does not require 
permission. Selling or distributing examples from O’Reilly books does require per- 
mission. Answering a question by citing this book and quoting example code does 
not require permission (although we appreciate attribution). Incorporating a signif- 
icant amount of example code from this book into your product's documentation 
does require permission. 


If you feel your use of code examples falls outside fair use or the permission given 
here, feel free to contact us at permissions@oreilly.com. 


O'Reilly Online Learning 


For more than 40 years, O'Reilly Media has pro- 


O'RE | LLY vided technology and business training, knowl- 


edge, and insight to help companies succeed. 


Our unique network of experts and innovators share their 
knowledge and expertise through books, articles, and our online learning platform. 
O'Reilly's online learning platform gives you on-demand access to live training 
courses, in-depth learning paths, interactive coding environments, and a vast collec- 
tion of text and video from O'Reilly and 200+ other publishers. For more informa- 
tion, please visit http://oreilly.com. 


We'd Like to Hear from You 


Please address comments and questions concerning this book to the publisher: 


O'Reilly Media, Inc. 

1005 Gravenstein Highway North 

Sebastopol, CA 95472 

800-998-9938 (in the United States or Canada) 
707-829-0515 (international or local) 
707-829-0104 (fax) 


We have a web page for this book, where we list errata, examples, and any additional 
information. You can access this page at: 


¢ https://oreil.ly/c-sharp-8 
Code listings and additional resources are provided at: 


¢ http://www.albahari.com/nutshell 
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To comment or ask technical questions about this book, send an email to: 
¢ bookquestions@oreilly.com 


For more information about our books, conferences, Resource Centers, and the 
O'Reilly Network, see our website at: 


¢ http://www.oreilly.com 
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Introducing C# and .NET Core 


C# is a general-purpose, type-safe, object-oriented programming language. The goal 
of the language is programmer productivity. To this end, C# balances simplicity, 
expressiveness, and performance. The chief architect of the language since its first 
version is Anders Hejlsberg (creator of Turbo Pascal and architect of Delphi). The 
C# language is platform neutral and works with a range of platform-specific 
frameworks. 


Object Orientation 


C# is a rich implementation of the object-orientation paradigm, which includes 
encapsulation, inheritance, and polymorphism. Encapsulation means creating a 
boundary around an object to separate its external (public) behavior from its inter- 
nal (private) implementation details. Following are the distinctive features of C# 
from an object-oriented perspective: 


Unified type system 
The fundamental building block in C# is an encapsulated unit of data and func- 
tions called a type. C# has a unified type system in which all types ultimately 
share a common base type. This means that all types, whether they represent 
business objects or are primitive types such as numbers, share the same basic 
functionality. For example, an instance of any type can be converted to a string 
by calling its ToString method. 


Classes and interfaces 
Ina traditional object-oriented paradigm, the only kind of type is a class. In C#, 
there are several other kinds of types, one of which is an interface. An interface 
is like a class that cannot hold data. This means that it can define only behavior 
(and not state), which allows for multiple inheritance as well as a separation 
between specification and implementation. 





Properties, methods, and events 
In the pure object-oriented paradigm, all functions are methods. In C#, meth- 
ods are only one kind of function member, which also includes properties and 
events (there are others, too). Properties are function members that encapsulate 
a piece of an object’s state, such as a button’s color or a label's text. Events are 
function members that simplify acting on object state changes. 


Although C# is primarily an object-oriented language, it also borrows from the 
functional programming paradigm; specifically: 


Functions can be treated as values 
Using delegates, C# allows functions to be passed as values to and from other 
functions. 


C# supports patterns for purity 
Core to functional programming is avoiding the use of variables whose values 
change, in favor of declarative patterns. C# has key features to help with those 
patterns, including the ability to write unnamed functions on the fly that “cap- 
ture” variables (lambda expressions), and the ability to perform list or reactive 
programming via query expressions. C# also makes it easy to define read-only 
fields and properties for writing immutable (read-only) types. 


Type Safety 


C# is primarily a type-safe language, meaning that instances of types can interact 
only through protocols they define, thereby ensuring each type’s internal consis- 
tency. For instance, C# prevents you from interacting with a string type as though it 
were an integer type. 


More specifically, C# supports static typing, meaning that the language enforces type 
safety at compile time. This is in addition to type safety being enforced at runtime. 


Static typing eliminates a large class of errors before a program is even run. It shifts 
the burden away from runtime unit tests onto the compiler to verify that all the 
types in a program fit together correctly. This makes large programs much easier to 
manage, more predictable, and more robust. Furthermore, static typing allows tools 
such as IntelliSense in Visual Studio to help you write a program, because it knows 
for a given variable what type it is, and hence what methods you can call on that 
variable. Such tools can also identify everywhere in your program that a variable, 
type, or method is used, allowing for reliable refactoring. 


C# also allows parts of your code to be dynamically typed via 
the dynamic keyword. However, C# remains a predominantly 
statically typed language. 


C# is also called a strongly typed language because its type rules are strictly enforced 
(whether statically or at runtime). For instance, you cannot call a function that’s 
designed to accept an integer with a floating-point number, unless you first explicitly 
convert the floating-point number to an integer. This helps prevent mistakes. 
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Memory Management 


C# relies on the runtime to perform automatic memory management. The Com- 
mon Language Runtime has a garbage collector that executes as part of your pro- 
gram, reclaiming memory for objects that are no longer referenced. This frees 
programmers from explicitly deallocating the memory for an object, eliminating the 
problem of incorrect pointers encountered in languages such as C++. 


C# does not eliminate pointers: it merely makes them unnecessary for most pro- 
gramming tasks. For performance-critical hotspots and interoperability, pointers 
and explicit memory allocation are permitted in blocks that are marked unsafe. 


Platform Support 


Historically, C# was used almost entirely for writing code to run on Windows plat- 
forms. However, Microsoft and other companies have since invested in other 
platforms: 


e The .NET Core Framework enables web application development in Linux and 
macOS (as well as Windows). 


e Xamarin enables mobile app development for iOS and Android. 


¢ Blazor compiles C# to web assembly that can run in a browser. 
And on the Windows platform: 


e .NET Core 3 enables rich-client and web application development on Windows 
7 to 10. 


¢ Universal Windows Platform (UWP) supports Windows 10 desktop and devices 
such as Xbox, Surface Hub, and Hololens. 


C# and the Common Language Runtime 


C# depends on a Common Language Runtime (CLR), which provides essential run- 
time services such as automatic memory management and exception handling. 
(The word common refers to the fact that the same runtime can be shared by other 
managed programming languages, such as F#, Visual Basic, and Managed C++.) 


C# is called a managed language because it compiles source code into managed 
code, which is represented in Intermediate Language (IL). The CLR converts the IL 
into the native code of the machine, such as X86 or X64, usually just prior to execu- 
tion. This is referred to as Just-In-Time (JIT) compilation. Ahead-of-time compila- 
tion is also available to improve startup time with large assemblies or resource- 
constrained devices (and to satisfy iOS app store rules when developing with 
Xamarin). 
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The container for managed code is called an assembly. An assembly contains not 
only IL, but type information (metadata). The presence of metadata allows assem- 
blies to reference types in other assemblies without needing additional files. 


You can examine and disassemble the contents of an assembly 
with Microsoft's ildasm tool. And with tools such as ILSpy or 
JetBrains dotPeek, you can go further and decompile the IL to 
C#. Because IL is higher-level than native machine code, the 
decompiler can do quite a good job of reconstructing the 
original C#. 


A program can query its own metadata (reflection) and even generate new IL at run- 
time (Reflection.Emit). 


Frameworks and Base Class Libraries 


A CLR does not ship on its own, but as part of a framework that includes a standard 
set of assemblies. When writing an application, you target a particular framework, 
which means that your application uses and depends on the functionality that the 
framework provides. Your choice of framework also determines which platforms 
your application will support. 


A framework comprises three layers, as illustrated in Figure 1-1. The Base Class 
Libraries (BCL) sit atop the CLR, providing features useful to any kind of applica- 
tion (such as collections, XML/JSON, input/output [I/O], networking, serialization, 
and parallel programming). Sitting atop the BCL are application framework layers, 
which provide the APIs for a user interface paradigm (such as ASP.NET Core for a 
web application, or Windows Presentation Foundation [WPF] for a rich-client 
application). A command-line program does not require an application layer. 





; APIs specific to writing web 
Application or rich -client applications 
Framework (ASP.NET Core, WPF, 
WinForms, UWP, Xamarin) 


BCL 


Base Class Libraries Lower-level functionality 
(e.g., collections, threading, 
CLR networking, I/O, XML/JSON) 


Common Language Runtime 














Figure 1-1. Framework architecture 
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When C# was first released in 2000, there was just the Microsoft .NET Framework. 
Now there are four major framework choices: 


.NET Core 
Modern open source framework for writing web and console applications that 
run on Windows, Linux, and macOS—and rich-client applications that run on 
Windows 7 through 10 (with .NET Core 3+). You can install multiple versions 
of .NET Core side by side, and applications can be self-contained, so as not to 
require a .NET Core installation. 


UWP 
For writing immersive touch-first applications that run on Windows 10 desk- 
top and devices (Xbox, Surface Hub, and Hololens). UWP apps are sandboxed 
and ship via the Windows Store. UWP is preinstalled with Windows 10. 


Mono + Xamarin 
Open source framework for writing mobile apps that run on iOS and Android. 


.NET Framework (superseded by .NET Core 3) 
For writing web and rich-client applications that target Windows desktop/ 
server. No major new releases are planned, although Microsoft will continue to 
support and maintain the current 4.8 release due to the wealth of existing 
applications. .NET Framework is preinstalled in Windows and supports C# 7.3 
and earlier. 


Although each of these frameworks differ in their platform support and intended 
uses, they all expose a similar CLR and BCL. 


You can take advantage of this commonality and write class 
libraries that work across multiple frameworks—see “NET 
Standard” on page 231 in Chapter 5. 


This book focuses on C# and the core functionality of the CLR and BCL, as shown 
in Figure 1-2. Even though the main emphasis is on .NET Core 3, we also cover 
some of the Windows Runtime types for UWP applications that provide functional- 
ity in parallel to the BCL. 
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Mono BCL 


Mono CLR 


Windows Desktop ASP.NET Core UWP 


; ) 3 WinI0 deskto, 
as win ees 


NET Core 3 BCL NETCore228cL | Windows 


Runtime 
NET Core 3 CLR NET Core 2.2 CLR 


Base Class Libraries & CLR 


Chapter 5 Framework Overview Chapter 16 Networking 

Chapter 6 Framework Fundamentals chapter 17 Serialization 

Chapter 7 Collections Chapter 18 Assemblies 

Chapter 8 LINQ Queries Chapter 19 Reflection and Metadata 
Chapter 9 LINO Operators Chapter 20 Dynamic Programming 
Chapter 10 LINQ to XML Chapter 21 Cryptography 

Chapter 11 Other XML/JSON Technologies Chapter 22 Advanced Threading 
Chapter 12 Disposal and Garbage Collection Chapter 23 Parallel Programming 
Chapter 13 Diagnostics Chapter 24 SpansT> & Memory<T> 
Chapter 14 Concurrency and Asynchrony Chapter 25 Native and COM Interop 
Chapter 15 Streams and I/O Chapter 26 Regular Expressions 


Chapter 27 The Roslyn Compiler 


C# Chapters! - 4 














Figure 1-2. Topics covered in this book—the application frameworks (shown in gray) 
are not covered 


Legacy and Niche Frameworks 


The following frameworks are still available to support older platforms: 


e Windows Runtime for Windows 8/8.1 (now UWP) 
e Microsoft XNA for game development (now UWP) 


e .NET Core 1.x and 2.x (for web and command-line applications only) 


There are also the following niche frameworks: 
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¢ The .NET Micro Framework is for running .NET code on highly resource- 
constrained embedded devices (under one megabyte). 


¢ Mono (upon which Xamarin sits) also has an application layer to develop 
cross-platform desktop “Windows Forms” applications on Linux, macOS, and 
Windows. Not all features are supported or work fully. (Another option for 
cross-platform user interface [UI] development is Avalonia, which is a WPF- 
inspired library that runs atop .NET Core and .NET Framework.) 


e Unity is a game development platform that allows game logic to be scripted 
with C#. 


It's also possible to run managed code within SQL Server. With SQL Server CLR 
integration, you can write custom functions, stored procedures, and aggregations in 
C# and then call them from SQL. This works in conjunction with .NET Framework 
and a special “hosted” CLR that enforces a sandbox to protect the integrity of the 
SQL Server process. 


Windows Runtime 


C# also interoperates with Windows Runtime (WinRT) technology. WinRT means 
two things: 


e A language-neutral object-oriented execution interface supported in Windows 
8 and above 


¢ A set of libraries baked into Windows 8 and above that support this execution 
interface 


Somewhat confusingly, the term WinRT has historically been 
used to mean two more things: 


¢ The predecessor to UWP; that is, the development plat- 
form for writing Store apps for Windows 8/8.1, some- 
times called “Metro” or “Modern” 


e The defunct mobile operating system for RISC-based 
tablets (“Windows RT”) that Microsoft released in 2011 


By execution interface, we mean a protocol for calling code that’s (potentially) writ- 
ten in another language. Microsoft Windows has historically provided a primitive 
execution interface in the form of low-level C-style function calls comprising the 
Win32 API. 


WinRT is much richer. In part, it is an enhanced version of Component Object 
Model (COM) that supports .NET, C++, and JavaScript. Unlike Win32, it’s object 
oriented and has a relatively rich type system. This means that referencing a WinRT 
library from C# feels much like referencing a .NET library—you might not even be 
aware that you're using WinRT. 
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The WinRT libraries in Windows 10 form an essential part of the UWP platform 
(UWP relies on both WinRT and .NET Core libraries). If you're targeting the stan- 
dard .NET Core platform, referencing the Windows 10 WinRT libraries is optional 
and can be useful if you need to access Windows 10-specific features not otherwise 
covered in .NET Core. 


The WinRT libraries in Windows 10 support the UWP UI for writing immersive 
touch-first applications. They also support mobile device—specific features such as 
sensors, text messaging, and so on (the new functionality of Window 8, 8.1, and 10 
has been exposed through WinRT rather than Win32). WinRT libraries also provide 
file I/O tailored to work well within the UWP sandbox. 


What distinguishes WinRT from ordinary COM is that WinRT projects its libraries 
into a multitude of languages, namely C#, Visual Basic, C++, and JavaScript, so that 
each language sees WinRT types (almost) as though they were written especially for 
it. For example, WinRT will adapt capitalization rules to suit the standards of the 
target language and will even remap some functions and interfaces. WinRT assem- 
blies also ship with rich metadata in .winmd files, which have the same format 
as .NET assembly files, allowing transparent consumption without special ritual; 
this is why you might be unaware that you're using WinRT rather than .NET types, 
aside from namespace differences. Another clue is that WinRT types are subject to 
COM-style restrictions; for instance, they offer limited support for inheritance and 
generics. 


In C#, you not only can consume WinRT libraries, you can also write your own 
(and call them from a JavaScript application). 


A Brief History of G# 


The following is a reverse chronology of the new features in each C# version, for the 
benefit of readers who are already familiar with an older version of the language. 


What's New in C# 8.0 
C# 8.0 ships with Visual Studio 2019. 


Indices and ranges 


Indices and ranges simplify working with elements or portions of an array (or the 
low-level types Span<T> and ReadOnlySpan<T>). 


Indices let you refer to elements relative to the end of an array by using the * opera- 
tor. ‘1 refers to the last element, 2 refers to the second-to-last element, and so on: 
char[] vowels = new char[] {'a','e','t','o','u'}; 


char LastElement vowels [91]; // ‘u' 
char secondToLast = vowels [42]; // 'o' 


Ranges let you “slice” an array by using the .. operator: 
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char[] firstTwo = vowels [..2]; // ‘a', ‘e 
char[] lastThree = vowels [2..]; eg Ong ur 
char[] middleOne = vowels [2..3] 1 fee 

char[] lastTwo = vowels [%2..]; LP *o' 5. “ur 


C# implements indexes and ranges with the help of the Index and Range types: 


Index last = “1; 
Range firstTwoRange = 0..2; 
char[] firstTwo = vowels [firstTwoRange]; // 'a', 'e' 


You can support indices and ranges in your own classes by defining an indexer with 
a parameter type of Index or Range: 


class Sentence 


ie 
string[] words = "The quick brown fox".Split(); 


public string this [Index index] => words [index]; 
public string[] this [Range range] => words [range]; 


} 


For more information, see “Indices and Ranges (C# 8)” on page 49 in Chapter 2. 


Null-coalescing assignment 

The ??= operator assigns a variable only if it’s null. Instead of this: 
if (s == null) s = "Hello, world"; 

you can now write this: 


s ??= "Hello, world"; 


using declarations 


If you omit the brackets and statement block following a using statement, it 
becomes a using declaration. The resource is then disposed when execution falls out- 
side the enclosing statement block: 


if (File.Exists ("file.txt")) 
{ 


using var reader = File.OpenText ("file.txt"); 
Console.WriteLine (reader .ReadLine()); 


oe 


In this case, reader will be disposed when execution falls outside the if statement 
block. 


readonly members 


C# 8 lets you apply the readonly modifier to a struct’s functions, ensuring that if the 
function attempts to modify any field, a compile-time error is generated: 
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struct Point 


{ 

public int X, Y; 

public readonly void ResetX() => X = 0; // Error! 
} 


If a readonly function calls a non-readonly function, the compiler generates a 
warning (and defensively copies the struct to avoid the possibility of a mutation). 


static local methods 


Adding the static modifier to a local method prevents it from seeing the local vari- 
ables and parameters of the enclosing method. This helps to reduce coupling as well 
as enabling the local method to declare variables as it pleases, without risk of collid- 
ing with those in the containing method. 


Default interface members 


C# 8 lets you add a default implementation to an interface member, making it 
optional to implement: 


interface ILogger 


{ 


void Log (string text) => Console.WriteLine (text); 
} 


This means that you can add a member to an interface without breaking implemen- 
tations. Default implementations must be called explicitly through the interface: 


((ILogger)new Logger()).Log ("message"); 


Interfaces can also define static members (including fields), which can be accessed 
from code inside default implementations: 


interface ILogger 


{ 
void Log (string text) => Console.WriteLine (Prefix + text); 
static string Prefix = ""; 


} 
or from outside the interface: 


ILogger.Prefix = "File log: "; 


unless restricted via an accessibility modifier on the static interface member (such 
as private, protected, or internal). Instance fields are prohibited. 


For more details, see “Default Interface Members (C# 8)” on page 129 in Chapter 3. 
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switch expressions 
From C# 8, you can use switch in the context of an expression: 


string cardName = cardNumber switch // assuming cardNumber is an int 


{ 
13 => "King", 
12 => "Queen", 
11 => "Jack", 
=> "Pip card" // equivalent to 'default' 


3; 


For more examples, see “switch expressions (C# 8)” on page 77 in Chapter 2. 


Tuple, positional, and property patterns 


C# 8 supports three new patterns, mostly for the benefit of switch statements/ 
expressions (see “Patterns” on page 201 in Chapter 4). Tuple patterns let you switch 
on multiple values: 


int cardNumber = 12; string suite = "spades"; 
string cardName = (cardNumber, suite) switch 
L 


(13, "spades") => "King of spades", 
(13, "clubs") => "King of clubs", 


35 


Positional patterns allow a similar syntax for objects that expose a deconstructor, 
and property patterns let you match on an object’s properties. You can use all of the 
patterns both in switches and by the is operator. The following example uses a 
property pattern to test whether obj is a string with a length of 4: 


if (obj is string { Length:4 }) ... 


Nullable reference types 


Whereas nullable value types bring nullability to value types, nullable reference types 
do the opposite and bring (a degree of) non-nullability to reference types, with the 
purpose of helping to avoid NullReferenceExceptions. Nullable reference types 
introduce a level of safety that’s enforced purely by the compiler in the form of 
warnings or errors when it detects code that’s at risk of generating a Null 
ReferenceException. 


Nullable reference types can be enabled either at the project level (via the Nullable 
element in the .csproj project file) or in code (via the #nullable directive). After it’s 
enabled, the compiler makes non-nullability the default: if you want a reference type 
to accept nulls, you must apply the ? suffix to indicate a nullable reference type: 
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#nuLllable enable // Enable nullable reference types from this point on 


string si = null; // Generates a compiler warning! (s1 is non-nullable) 
string? s2 = null; // OK: s2 is nullable reference type 


Uninitialized fields also generate a warning (if the type is not marked as nullable), as 
does dereferencing a nullable reference type, if the compiler thinks a Null 
ReferenceException might occur: 


void Foo (string? s) => Console.Write (s.Length); // Warning (.Length) 
To remove the warning, you can use the null-forgiving operator (!): 
void Foo (string? s) => Console.Write (s!.Length); 


For a full discussion, see “Nullable Reference Types (C# 8)” on page 191 in 
Chapter 4. 


Asynchronous streams 


Prior to C# 8, you could use yield return to write an iterator, or await to write an 
asynchronous function. But you couldn't do both and write an iterator that awaits, 
yielding elements asynchronously. C# 8 fixes this through the introduction of asyn- 
chronous streams: 


async IAsyncEnumerable<int> RangeAsync ( 
int start, int count, int delay) 


{ 


for (int i = start; i < start + count; i++) 
{ 
await Task.Delay (delay); 
yield return i; 
} 
} 


The await foreach statement consumes an asynchronous stream: 


await foreach (var number in RangeAsync (0, 10, 100)) 
Console.WriteLine (number); 


For more information, see “Asynchronous Streams (C# 8)” on page 616 in 
Chapter 14. 

What's New in C# 7.x 

C# 7 shipped with Visual Studio 2017. 


7.3 


C# 7.3 made minor improvements to existing features, such as enabling use of the 
equality operators with tuples, improved overload resolution, and the ability to 
apply attributes to the backing fields of automatic properties: 


[field:NonSerialized] 
public int MyProperty { get; set; } 
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C# 7.3 also built on C# 7.2’s advanced low-allocation programming features, with 
the ability to reassign ref locals, no requirement to pin when indexing fixed fields, 
and field initializer support with stackalloc: 

int* pointer = stackalloc int[] {1, 2, 3}; 

Span<int> arr = stackalloc [] {1, 2, 3}; 


Notice that stack-allocated memory can be assigned directly to a Span<T>. We 
describe spans—and why you would use them—in Chapter 24. 


C#7.2 


C# 7.2 added a new private protected modifier (the intersection of internal and 
protected), the ability to follow named arguments with positional ones when call- 
ing methods, and readonly structs. A readonly struct enforces that all fields are 
readonly, to aid in declaring intent and to allow the compiler more optimization 
freedom: 


readonly struct Point 


{ 
public readonly int X, Y; // X and Y must be readonly 


} 


C# 7.2 also added specialized features to help with micro-optimization and low- 
allocation programming: see “The in modifier” on page 60, “Ref Locals” on page 63, 
and “Ref Returns” on page 63 in Chapter 2, and “Ref Structs” on page 122 in 
Chapter 3. 


7.1 


From C# 7.1, you can omit the type when using the default keyword, if the type 
can be inferred: 


decimal number = default; // number is decimal 


C# 7.1 also relaxed the rules for switch statements (so that you can pattern-match 
on generic type parameters), allowed a program's Main method to be asynchronous, 
and allowed tuple element names to be inferred: 


var now = DateTime.Now; 
var tuple = (now.Hour, now.Minute, now.Second); 


Numeric literal improvements 


Numeric literals in C# 7 can include underscores to improve readability. These are 
called digit separators and are ignored by the compiler: 


int million = 1_000_000; 
Binary literals can be specified with the 0b prefix: 


var b = 0b1010_1011_ 1100 1101 1110 1111; 
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Out variables and discards 


C# 7 makes it easier to call methods that contain out parameters. First, you can now 
declare out variables on the fly (see “Out variables and discards” on page 59 in 
Chapter 2): 


bool successful = int.TryParse ("123", out int result); 
Console.WriteLine (result); 


And when calling a method with multiple out parameters, you can discard ones 
youre uninterested in with the underscore character: 


SomeBigMethod (out _, out _, out _, out int x, out _, out _, out _); 
Console.WriteLine (x); 


Type patterns and pattern variables 


You can also introduce variables on the fly with the is operator. These are called 
pattern variables (see “Introducing a pattern variable” on page 110 in Chapter 3): 


void Foo (object x) 
{ 
if (x is string s) 
Console.WriteLine (s.Length); 


} 


The switch statement also supports type patterns, so you can switch on type as well 
as constants (see “Switching on types” on page 75 in Chapter 2). You can specify 
conditions with a when clause and also switch on the null value: 


switch (x) 
{ 
case int i: 
Console.WriteLine ("It's an int!"); 
break; 
case string s: 
Console.WriteLine (s.Length); // We can use the s variable 
break; 
case bool b when b == true: // Matches only when b is true 
Console.WriteLine ("True"); 
break; 
case null: 
Console.WriteLine ("Nothing"); 
break; 


Local methods 


A local method is a method declared within another function (see “Local methods” 
on page 93 in Chapter 3): 


void WriteCubes() 

{ 
Console.WriteLine (Cube (3)); 
Console.WriteLine (Cube (4)); 
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Console.WriteLine (Cube (5)); 


int Cube (int value) => value * value * value; 


} 


Local methods are visible only to the containing function and can capture local vari- 
ables in the same way that lambda expressions do. 


More expression-bodied members 


C# 6 introduced the expression-bodied fat-arrow syntax for methods, read-only 
properties, operators, and indexers. C# 7 extends this to constructors, read/write 
properties, and finalizers: 


public class Person 
{ 
string name; 


public Person (string name) => Name = name; 


public string Name 


{ 
get => name; 
set => name = value ?? ""; 
} 
~Person () => Console.WriteLine ("finalize"); 
} 
Deconstructors 


C# 7 introduces the deconstructor pattern (see “Deconstructors” on page 95 in 
Chapter 3). Whereas a constructor typically takes a set of values (as parameters) and 
assigns them to fields, a deconstructor does the reverse and assigns fields back to a 
set of variables. We could write a deconstructor for the Person class in the preced- 
ing example as follows (exception-handling aside): 


public void Deconstruct (out string firstName, out string LastName) 


{ 


int spacePos = name.IndexOf (' '); 
firstName = name.Substring (0, spacePos); 
LastName = name.Substring (spacePos + 1); 


} 


Deconstructors are called with the following special syntax: 


var joe = new Person ("Joe Bloggs"); 


var (first, last) = joe; // Deconstruction 
Console.WriteLine (first); // Joe 
Console.WriteLine (Last); // Bloggs 





ABriefHistory of # | 15 


(a) 
fe) 
= 
o 


oe ie 30) 
6Bulonpodsj}u] 





Tuples 


Perhaps the most notable improvement to C# 7 is explicit tuple support (see 
“Tuples” on page 197 in Chapter 4). Tuples provide a simple way to store a set of 
related values: 


var bob = ("Bob", 23); 
Console.WriteLine (bob.Item1); // Bob 
Console.WriteLine (bob.Item2); // 23 


C#’s new tuples are syntactic sugar for using the System.ValueTuple<..> generic 
structs. But thanks to compiler magic, tuple elements can be named: 


var tuple = (name:"Bob", age:23); 
Console.WriteLine (tuple.name) ; // Bob 
Console.WriteLine (tuple.age); // 23 


With tuples, functions can return multiple values without resorting to out parame- 
ters or extra type baggage: 


static (int row, int column) GetFilePosition() => (3, 10); 


static void Main() 


{ 
var pos = GetFilePosition(); 
Console.WriteLine (pos.row); // 3 
Console.WriteLine (pos.column) ; // 10 
} 


Tuples implicitly support the deconstruction pattern, so you can easily deconstruct 
them into individual variables: 


static void Main() 


{ 
(int row, int column) = GetFilePosition(); // Creates 2 local variables 
Console.WriteLine (row); // 3 
Console.WriteLine (column); // 10 
} 
throw expressions 


Prior to C# 7, throw was always a statement. Now it can also appear as an expression 
in expression-bodied functions: 


public string Foo() => throw new NotImplementedException(); 
A throw expression can also appear in a ternary conditional expression: 


string Capitalize (string value) => 
value == null ? throw new ArgumentException ("value") : 
value <= "" 200": 
char.ToUpper (value[0]) + value.Substring (1); 
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What's New in G# 6.0 


C# 6.0, which shipped with Visual Studio 2015, features a new-generation compiler, 
completely written in C#. Known as project “Roslyn,” the new compiler exposes the 
entire compilation pipeline via libraries, allowing you to perform code analysis on 
arbitrary source code (see Chapter 27). The compiler itself is open source, and the 
source code is available on GitHub. 


In addition, C# 6.0 features several minor but significant enhancements, aimed pri- 
marily at reducing code clutter. 


The null-conditional (“Elvis”) operator (see “Null Operators” on page 69 in Chap- 
ter 2) avoids having to explicitly check for null before calling a method or accessing 
a type member. In the following example, result evaluates to null instead of throw- 
ing a NullReferenceException: 


System.Text.StringBuilder sb = null; 
string result = sb?.ToString(); // result is null 


Expression-bodied functions (see “Methods” on page 92 in Chapter 3) allow methods, 
properties, operators, and indexers that comprise a single expression to be written 
more tersely, in the style of a lambda expression: 


public int TimesTwo (int x) => x * 2; 
public string SomeProperty => "Property value"; 


Property initializers (Chapter 3) let you assign an initial value to an automatic 
property: 
public DateTime TimeCreated { get; set; } = DateTime.Now; 
Initialized properties can also be read-only: 
public DateTime TimeCreated { get; } = DateTime.Now; 
Read-only properties can also be set in the constructor, making it easier to create 


immutable (read-only) types. 


Index initializers (Chapter 4) allow single-step initialization of any type that exposes 
an indexer: 


var dict = new Dictionary<int,string>() 


{ 
[3] = "three", 
[10] = "ten" 
t3 
String interpolation (see “String Type” on page 46 in Chapter 2) offers a succinct 
alternative to string. Format: 


string s = $"It is {DateTime.Now.DayOfWeek} today"; 


Exception filters (see “try Statements and Exceptions” on page 170 in Chapter 4) let 
you apply a condition to a catch block: 
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string html; 
try 


{ 
html = new WebClient().DownloadString ("http://asef"); 


} 


catch (WebException ex) when (ex.Status == WebExceptionStatus. Timeout) 


{ 
e 


The using static (see “Namespaces” on page 81 in Chapter 2) directive lets you 
import all the static members of a type so that you can use those members 
unqualified: 


using static System.Console; 


WriteLine ("Hello, world"); // WriteLine instead of Console.WriteLine 


The nameof (Chapter 3) operator returns the name of a variable, type, or other sym- 
bol as a string. This avoids breaking code when you rename a symbol in Visual 
Studio: 


int capacity = 123; 
string x = nameof (capacity); // x is "capacity" 
string y = nameof (Uri.Host); // y is "Host" 


And finally, you're now allowed to await inside catch and finally blocks. 


What's New in C# 5.0 


C# 5.0’s big new feature was support for asynchronous functions via two new key- 
words, async and await. Asynchronous functions enable asynchronous continua- 
tions, which make it easier to write responsive and thread-safe rich-client 
applications. They also make it easy to write highly concurrent and efficient I/O- 
bound applications that don’t tie up a thread resource per operation. 


We cover asynchronous functions in detail in Chapter 14. 


What's New in C# 4.0 


C# 4.0 introduced four major enhancements: 


¢ Dynamic binding (Chapters 4 and 20) defers binding—the process of resolving 
types and members—from compile time to runtime and is useful in scenarios 
that would otherwise require complicated reflection code. Dynamic binding is 
also useful when interoperating with dynamic languages and COM compo- 
nents. 


Optional parameters (Chapter 2) allow functions to specify default parameter 
values so that callers can omit arguments, and named arguments allow a func- 
tion caller to identify an argument by name rather than position. 
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e Type variance rules were relaxed in C# 4.0 (Chapters 3 and 4), such that type 
parameters in generic interfaces and generic delegates can be marked as cova- 
riant or contravariant, allowing more natural type conversions. 


e COM interoperability (Chapter 25) was enhanced in C# 4.0 in three ways. First, 
arguments can be passed by reference without the ref keyword (particularly 
useful in conjunction with optional parameters). Second, assemblies that con- 
tain COM interop types can be linked rather than referenced. Linked interop 
types support type equivalence, avoiding the need for Primary Interop Assem- 
blies and putting an end to versioning and deployment headaches. Third, 
functions that return COM-Variant types from linked interop types are map- 
ped to dynamic rather than object, eliminating the need for casting. 


What's New in G# 3.0 


The features added to C# 3.0 were mostly centered on Language-Integrated Query 
(LINQ) capabilities. LINQ enables queries to be written directly within a C# pro- 
gram and checked statically for correctness, and query both local collections (such 
as lists or XML documents) or remote data sources (such as a database). The C# 3.0 
features added to support LINQ comprised implicitly typed local variables, anony- 
mous types, object initializers, lambda expressions, extension methods, query 
expressions, and expression trees. 


Implicitly typed local variables (var keyword, Chapter 2) let you omit the variable 
type in a declaration statement, allowing the compiler to infer it. This reduces clut- 
ter as well as allowing anonymous types (Chapter 4), which are simple classes cre- 
ated on the fly that are commonly used in the final output of LINQ queries. You can 
also implicitly type arrays (Chapter 2). 


Object initializers (Chapter 3) simplify object construction by allowing you to set 
properties inline after the constructor call. Object initializers work with both named 
and anonymous types. 


Lambda expressions (Chapter 4) are miniature functions created by the compiler on 
the fly; they are particularly useful in “fluent” LINQ queries (Chapter 8). 


Extension methods (Chapter 4) extend an existing type with new methods (without 
altering the type’s definition), making static methods feel like instance methods. 
LINQ’s query operators are implemented as extension methods. 


Query expressions (Chapter 8) provide a higher-level syntax for writing LINQ quer- 
ies that can be substantially simpler when working with multiple sequences or range 
variables. 


Expression trees (Chapter 8) are miniature code Document Object Models (DOMs) 
that describe lambda expressions assigned to the special type Expression 
<TDelegate>. Expression trees make it possible for LINQ queries to execute 





ABrief History of # | 19 


‘@) 
fe) 
= 
1) 


LAN’ 3 #5 
Buldnpowju] 





remotely (e.g., on a database server) because they can be introspected and translated 
at runtime (e.g., into a SQL statement). 


C# 3.0 also added automatic properties and partial methods. 


Automatic properties (Chapter 3) cut the work in writing properties that simply 
get/set a private backing field by having the compiler do that work automatically. 
Partial methods (Chapter 3) let an autogenerated partial class provide customizable 
hooks for manual authoring which “melt away” if unused. 


What's New in C# 2.0 


The big new features in C# 2 were generics (Chapter 3), nullable value types (Chap- 
ter 4), iterators (Chapter 4), and anonymous methods (the predecessor to lambda 
expressions). These features paved the way for the introduction of LINQ in C# 3. 


C# 2 also added support for partial classes, static classes, and a host of minor and 
miscellaneous features such as the namespace alias qualifier, friend assemblies, and 
fixed-size buffers. 


The introduction of generics required a new CLR (CLR 2.0), because generics main- 
tain full type fidelity at runtime. 
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C# Language Basics 


In this chapter, we introduce the basics of the C# language. 


All programs and code snippets in this and the following two 
chapters are available as interactive samples in LINQPad. 
Working through these samples in conjunction with the book 
accelerates learning in that you can edit the samples and 
instantly see the results without needing to set up projects and 
solutions in Visual Studio. 


To download them in LINQPad, click the Samples tab, and 
then click “Download more samples.” 


A First G# Program 


Following is a program that multiplies 12 by 30 and prints the result, 360, to the 
screen. The double forward slash indicates that the remainder of a line is a 
comment: 


using System; // Importing namespace 
class Test // Class declaration 
{ 
static void Main() // Method declaration 
{ 
int x = 12 * 30; // Statement 1 
Console.WriteLine (x); // Statement 2 
} // End of method 
} // End of class 


At the heart of this program lie two statements: 


int x = 12 * 30; 
Console.WriteLine (x); 
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Statements in C# execute sequentially and are terminated by a semicolon (or a code 
block, as you'll see later). The first statement computes the expression 12 * 30 and 
stores the result in a local variable, named x, which is an integer type. The second 
statement calls the Console class's WriteLine method, to print the variable x to a text 
window on the screen. 


A method performs an action in a series of statements, called a statement block—a 
pair of braces containing zero or more statements. We defined a single method 
named Main: 


static void Main() 


{ 
oa 


Writing higher-level functions that call upon lower-level functions simplifies a pro- 
gram. We can refactor our program with a reusable method that multiplies an inte- 
ger by 12, as follows: 


using System; 


class Test 
{ 
static void Main() 
{ 
Console.WriteLine (FeetToInches (30)); // 360 
Console.WriteLine (FeetToInches (100)); // 1200 
} 


static int FeetToInches (int feet) 


{ 
int inches = feet * 12; 
return inches; 
t 
i 
A method can receive input data from the caller by specifying parameters and output 
data back to the caller by specifying a return type. We defined a method called 
FeetToInches that has a parameter for inputting feet, and a return type for out- 
putting inches: 


static int FeetToInches (int feet ) {...} 


The literals 30 and 100 are the arguments passed to the FeetToInches method. The 
Main method in our example has empty parentheses because it has no parameters; it 
is void because it doesn’t return any value to its caller: 


static void Main() 


C# recognizes a method called Main as signaling the default entry point of execu- 
tion. The Main method can optionally return an integer (rather than void) in order 
to return a value to the execution environment (where a nonzero value typically 
indicates an error). The Main method can also optionally accept an array of strings 
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as a parameter (that will be populated with any arguments passed to the executable); 
for example: 


static int Main (string[] args) {...} 


An array (such as string[]) represents a fixed number of ele- 
ments of a particular type. Arrays are specified by placing 
square brackets after the element type. We describe them in 
“Arrays” on page 48. 


(The Main method can also be declared async and return a Task or Task<int> in 
support of asynchronous programming, which we cover in Chapter 14.) 


Methods are one of several kinds of functions in C#. Another kind of function we 
used in our example program was the * operator, which performs multiplication. 
There are also constructors, properties, events, indexers, and finalizers. 


In our example, the two methods are grouped into a class. A class groups function 
members and data members to form an object-oriented building block. The 
Console class groups members that handle command-line input/output (I/O) func- 
tionality, such as the WriteLine method. Our Test class groups two methods—the 
Main method and the FeetToInches method. A class is a kind of type, which we 
examine in “Type Basics” on page 27. 


At the outermost level of a program, types are organized into namespaces. The 
using directive makes the System namespace available to our application, to use the 
Console class. We could define all of our classes within the TestPrograms name- 
space, as follows: 


using System; 


Namespace TestPrograms 


{ 
class Test {...} 
class Test2 {...} 


} 


The .NET Core libraries are organized into nested namespaces. For example, this is 
the namespace that contains types for handling text: 


using System.Text; 


The using directive is there for convenience; you can also refer to a type by its fully 
qualified name, which is the type name prefixed with its namespace, such as 
System. Text.StringBuilder. 


Compilation 


The C# compiler compiles source code (as a set of files with the .cs extension) into 
an assembly. An assembly is the unit of packaging and deployment in .NET. An 
assembly can be either an application or a library. A normal console or Windows 
application has a Main method (the entry point), whereas a library does not. The 
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purpose of a library is to be called upon (referenced) by an application or by other 
libraries. .NET Core itself is a set of assemblies (as well as a runtime environment). 


Unlike .NET Framework, .NET Core assemblies never have 
a .exe extension. The .exe you might see after building a NET 
Core application is a platform-specific native loader responsi- 
ble for starting your application’s .dll assembly. 


.NET Core also allows you to create a self-contained deploy- 
ment that includes the loader, your assemblies, and the .NET 
Core Framework—all in a single .exe file. 


The dotnet tool (dotnet.exe on Windows) helps you to manage .NET source code 
and binaries from the command line. You can use it to both build and run your pro- 
gram, as an alternative to using an Integrated Development Environment (IDE) 
such as Visual Studio or Visual Studio Code. 


You can obtain the dotnet tool either by installing the .NET Core SDK or by instal- 
ling Visual Studio. Its default location is %ProgramFiles%\dotnet on Windows 
or /usr/bin/dotnet on Ubuntu Linux. 


To compile an application, the dotnet tool requires a project file as well as one or 
more C# files. The following command scaffolds a new console project (creates its 
basic structure): 


dotnet new Console -n MyFirstProgram 


This creates a subfolder called MyFirstProgram containing a project file called 
MyFirstProgram.csproj and a C# file called Program.cs with a Main method that 
prints “Hello, world”. 


To build and run your program, run this command from the MyFirstProgram 
folder: 


dotnet run MyFirstProgram 
Or, if you just want to build without running: 
dotnet build MyFirstProgram.csproj 
The output assembly will be written to a subdirectory under bin\debug. 


We explain assemblies in detail in Chapter 18. 


Syntax 


C# syntax is inspired by C and C++ syntax. In this section, we describe C#’s ele- 
ments of syntax, using the following program: 


using System; 


class Test 


{ 


static void Main() 


{ 
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int x = 12 * 30; 
Console.WriteLine (x); 
} 
} 


Identifiers and Keywords 


Identifiers are names that programmers choose for their classes, methods, variables, 
and so on. Here are the identifiers in our example program, in the order in which 
they appear: 


System Test Main x Console’ WriteLine 


An identifier must be a whole word, essentially made up of Unicode characters 
starting with a letter or underscore. C# identifiers are case sensitive. By convention, 
parameters, local variables, and private fields should be in camel case (e.g., my 
Variable), and all other identifiers should be in Pascal case (e.g., MyMethod). 


Keywords are names that mean something special to the compiler. These are the 
keywords in our example program: 
static void int 


using class 


Most keywords are reserved, which means that you can't use them as identifiers. 
Here is the full list of C# reserved keywords: 


abstract do in protected true 

as double int public try 

base else interface readonly typeof 
bool enum internal ref uint 
break event is return ulong 
byte explicit lock sbyte unchecked 
case extern long sealed unsafe 
catch false Namespace short ushort 
char finally new sizeof using 
checked fixed null stackalloc virtual 
class float object static void 
const for operator string volatile 
continue foreach out struct while 
decimal goto override switch 

default if params this 

delegate implicit private throw 

Avoiding conflicts 


If you really want to use an identifier that clashes with a reserved keyword, you can 
do so by qualifying it with the @ prefix. For instance: 


class class {...} 
class @class {...} 


// Illegal 
// Legal 
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The @ symbol doesn’t form part of the identifier itself. So, @myVariable is the same 
as myVariable. 


The @ prefix can be useful when consuming libraries written 
in other .NET languages that have different keywords. 


Contextual keywords 


Some keywords are contextual, meaning that you also can use them as identifiers— 
without an @ symbol: 


add dynamic into remove where 
alias equals join select yield 
ascending from let set 

async get nameof unmanaged 

await global on value 

by group orderby var 

descending in partial when 


With contextual keywords, ambiguity cannot arise within the context in which they 
are used. 


Literals, Punctuators, and Operators 


Literals are primitive pieces of data lexically embedded into the program. The liter- 
als we used in our example program are 12 and 30. 


Punctuators help demarcate the structure of the program. These are the punctuators 
we used in our example program: 


{ } 3 
The braces group multiple statements into a statement block. 


The semicolon terminates a statement. (Statement blocks, however, do not require a 
semicolon.) Statements can wrap multiple lines: 


Console.WriteLine 
(1+2+34+44+5+6+7+8+9+ 10); 


An operator transforms and combines expressions. Most operators in C# are deno- 
ted with a symbol, such as the multiplication operator, *. We discuss operators in 
more detail later in this chapter. These are the operators we used in our example 
program: 


QO * = 


A period denotes a member of something (or a decimal point with numeric literals). 
Parentheses are used when declaring or calling a method; empty parentheses are 
used when the method accepts no arguments. (Parentheses also have other purposes 





26 | Chapter 2: C# Language Basics 


that you'll see later in this chapter.) An equals sign performs assignment. (The dou- 
ble equals sign, ==, performs equality comparison, as you'll see later.) 


Comments 


C# offers two different styles of source-code documentation: single-line comments 
and multiline comments. A single-line comment begins with a double forward slash 
and continues until the end of the line; for example: 


int x = 3; // Comment about assigning 3 to x 
A multiline comment begins with /* and ends with */; for example: 


int x = 3; /* This is a comment that 
spans two lines */ 


Comments can embed XML documentation tags, which we explain in “XML Docu- 
mentation” on page 226 in Chapter 4. 


Type Basics 


A type defines the blueprint for a value. In our example, we used two literals of type 
int with values 12 and 30. We also declared a variable of type int whose name 
was x: 


static void Main() 


{ 
int x = 12 * 30; 
Console.WriteLine (x); 


} 


A variable denotes a storage location that can contain different values over time. In 
contrast, a constant always represents the same value (more on this later): 


const int y = 360; 


All values in C# are instances of a type. The meaning of a value and the set of possi- 
ble values a variable can have are determined by its type. 


Predefined Type Examples 


Predefined types are types that are specially supported by the compiler. The int 
type is a predefined type for representing the set of integers that fit into 32 bits of 
memory, from —2*! to 23!—1, and is the default type for numeric literals within this 
range. We can perform functions such as arithmetic with instances of the int type, 
as follows: 


int x = 12 * 30; 


Another predefined C# type is string. The string type represents a sequence of 
characters, such as “NET” or “http://oreilly.com”. We can work with strings by call- 
ing functions on them, as follows: 
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string message = "Hello world"; 
string upperMessage = message. ToUpper(); 


Console.WriteLine (upperMessage); // HELLO WORLD 

int x = 2015; 

message = message + x.ToString(); 

Console.WriteLine (message); // Hello wortd2015 


The predefined bool type has exactly two possible values: true and false. The bool 
type is commonly used with an if statement to conditionally branch execution 
flow: 


bool simpleVar = false; 
if (simpleVar) 
Console.WriteLine ("This will not print"); 


int x = 5000; 
bool LessThanAMile = x < 5280; 
if (lessThanAMile) 
Console.WriteLine ("This will print"); 


In C#, predefined types (also referred to as built-in types) are 
recognized with a C# keyword. The System namespace 
in .NET Core contains many important types that are not pre- 
defined by C# (e.g., DateTime). 


Custom Type Examples 


Just as we can build complex functions from simple functions, we can build com- 
plex types from primitive types. In this next example, we define a custom type 
named UnitConverter—a class that serves as a blueprint for unit conversions: 


using System; 


public class UnitConverter 


{ 
int ratio; // Field 
public UnitConverter (int unitRatio) {ratio = unitRatio; } // Constructor 
public int Convert (int unit) {return unit * ratio; } // Method 
} 
class Test 
{ 
static void Main() 
{ 
UnitConverter feetToInchesConverter = new UnitConverter (12); 
UnitConverter milesToFeetConverter = new UnitConverter (5280); 


Console.WriteLine (feetToInchesConverter.Convert(30)); // 360 

Console.WriteLine (feetToInchesConverter.Convert(100)); // 1200 

Console.WriteLine (feetToInchesConverter.Convert( 
milesToFeetConverter.Convert(1))); // 63360 





28 | Chapter 2: C# Language Basics 


ui 
} 


Members of a type 


A type contains data members and function members. The data member of 
UnitConverter is the field called ratio. The function members of UnitConverter 
are the Convert method and the UnitConverter’s constructor. 


Symmetry of predefined types and custom types 


A beautiful aspect of C# is that predefined types and custom types have few differ- 
ences. The predefined int type serves as a blueprint for integers. It holds data—32 
bits—and provides function members that use that data, such as ToString. Simi- 
larly, our custom UnitConverter type acts as a blueprint for unit conversions. It 
holds data—the ratio—and provides function members to use that data. 


Constructors and instantiation 


Data is created by instantiating a type. Predefined types can be instantiated simply 
by using a literal such as 12 or "Hello world". The new operator creates instances of 
a custom type. We created and declared an instance of the UnitConverter type with 
this statement: 


UnitConverter feetToInchesConverter = new UnitConverter (12); 


Immediately after the new operator instantiates an object, the object’s constructor is 
called to perform initialization. A constructor is defined like a method, except that 
the method name and return type are reduced to the name of the enclosing type: 


public class UnitConverter 


{ 


public UnitConverter (int unitRatio) { ratio = unitRatio; } 


Instance versus static members 


The data members and function members that operate on the instance of the type 
are called instance members. The UnitConverter’s Convert method and the int’s 
ToString method are examples of instance members. By default, members are 
instance members. 


Data members and function members that don't operate on the instance of the type 
but rather on the type itself must be marked as static. The Test.Main and 
Console.WriteLine methods are static methods. The Console class is actually a 
static class, which means that all of its members are static. You never actually create 
instances of a Console—one console is shared across the entire application. 
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Let’s contrast instance from static members. In the following code, the instance field 
Name pertains to an instance of a particular Panda, whereas Population pertains to 
the set of all Panda instances: 


public class Panda 


{ 
public string Name; // Instance field 
public static int Population; // Static field 


public Panda (string n) // Constructor 
{ 

Name = n; // Assign the instance field 

Population = Population + 1; // Increment the static Population field 
} 


} 


The following code creates two instances of the Panda, prints their names, and then 
prints the total population: 


using System; 


class Test 


{ 


static void Main() 


{ 
Panda p1 = new Panda ("Pan Dee"); 
Panda p2 = new Panda ("Pan Dah"); 


Console.WriteLine (p1.Name); // Pan Dee 
Console.WriteLine (p2.Name); // Pan Dah 


Console.WriteLine (Panda.Population);  // 2 
} 
} 


Attempting to evaluate p1.Population or Panda.Name will generate a compile-time 
error. 


The public keyword 


The public keyword exposes members to other classes. In this example, if the Name 
field in Panda was not marked as public, it would be private and the Test class could 
not access it. Marking a member public is how a type communicates: “Here is what 
I want other types to see—everything else is my own private implementation 
details.” In object-oriented terms, we say that the public members encapsulate the 
private members of the class. 


Conversions 


C# can convert between instances of compatible types. A conversion always creates 
a new value from an existing one. Conversions can be either implicit or explicit: 
implicit conversions happen automatically, and explicit conversions require a cast. 
In the following example, we implicitly convert an int to a long type (which has 
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twice the bit capacity of an int), and we explicitly cast an int to a short type (which 
has half the bit capacity of an int): 


int x = 12345; // int is a 32-bit integer 
long y = x; // Implicit conversion to 64-bit integer 
short z = (short)x; // Explicit conversion to 16-bit integer 


Implicit conversions are allowed when both of the following are true: 


e The compiler can guarantee that they will always succeed. 


¢ No information is lost in conversion.! 
Conversely, explicit conversions are required when one of the following is true: 


¢ The compiler cannot guarantee that they will always succeed. 


¢ Information might be lost during conversion. 


(If the compiler can determine that a conversion will always fail, both kinds of con- 
version are prohibited. Conversions that involve generics can also fail in certain 
conditions—see “Type Parameters and Conversions” on page 142 in Chapter 3.) 


The numeric conversions that we just saw are built into the lan- 
guage. C# also supports reference conversions and boxing con- 
versions (see Chapter 3) as well as custom conversions (see 
“Operator Overloading” on page 216 in Chapter 4). The com- 
piler doesn’t enforce the aforementioned rules with custom 
conversions, so it’s possible for badly designed types to behave 
otherwise. 


Value Types versus Reference Types 
All C# types fall into the following categories: 


¢ Value types 
¢ Reference types 
¢ Generic type parameters 


¢ Pointer types 


In this section, we describe value types and reference types. 
We cover generic type parameters in “Generics” on page 135 
in Chapter 3, and pointer types in “Unsafe Code and Pointers” 
on page 219 in Chapter 4. 





1 A minor caveat is that very large Long values lose some precision when converted to double. 
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Value types comprise most built-in types (specifically, all numeric types, the char 
type, and the bool type) as well as custom struct and enum types. 


Reference types comprise all class, array, delegate, and interface types. (This includes 
the predefined string type.) 


The fundamental difference between value types and reference types is how they are 
handled in memory. 


Value types 


The content of a value-type variable or constant is simply a value. For example, the 
content of the built-in value type, int, is 32 bits of data. 


You can define a custom value type with the struct keyword (see Figure 2-1): 
public struct Point { public int X; public int Y; } 
or more tersely: 


public struct Point { public int X, Y; } 





Point struct 





{value / instance 








Figure 2-1. A value-type instance in memory 


The assignment of a value-type instance always copies the instance; for example: 


static void Main() 


{ 
Point p1 = new Point(); 
pi.X = 7; 
Point p2 = pi; // Assignment causes copy 


Console.WriteLine (p1.X); // 7 
Console.WriteLine (p2.X); // 7 


p1.X = 9; // Change p1.X 


Console.WriteLine (p1.X); // 9 
Console.WriteLine (p2.X); // 7 


} 
Figure 2-2 shows that p1 and p2 have independent storage. 








Point struct 
pl p2 














Figure 2-2. Assignment copies a value-type instance 
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Reference types 


A reference type is more complex than a value type, having two parts: an object and 
the reference to that object. The content of a reference-type variable or constant is a 
reference to an object that contains the value. Here is the Point type from our previ- 
ous example rewritten as a class rather than a struct (shown in Figure 2-3): 


public class Point { public int X, Y; } 
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Figure 2-3. A reference-type instance in memory 


Assigning a reference-type variable copies the reference, not the object instance. 
This allows multiple variables to refer to the same object—something not ordinarily 
possible with value types. If we repeat the previous example, but with Point now a 
class, an operation to p1 affects p2: 


static void Main() 


{ 
Point p1 = new Point(); 
pi.X = 7; 
Point p2 = p1; // Copies p1 reference 


Console.WriteLine (p1.X); // 7 
Console.WriteLine (p2.X); // 7 


p1.X = 9; // Change p1.X 


Console.WriteLine (p1.X); // 9 
Console.WriteLine (p2.X); // 9 
} 


Figure 2-4 shows that p1 and p2 are two references that point to the same object. 
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Figure 2-4. Assignment copies a reference 
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Null 


A reference can be assigned the literal null, indicating that the reference points to 
no object: 


class Point {...} 


Point p = null; 
Console.WriteLine (p == null); // True 


// The following line generates a runtime error 
// (a NullReferenceException is thrown): 
Console.WriteLine (p.X); 


C# 8 introduces a new feature to reduce accidental Null 
ReferenceException errors. For more on this, see “Nullable 
Reference Types (C# 8)” on page 191 in Chapter 4. 


In contrast, a value type cannot ordinarily have a null value: 


struct Point {...} 


Point p = null; // Compile-time error 
int x = null; // Compile-time error 


C# also has a construct called nullable value types for repre- 
senting value-type nulls. For more information, see “Nullable 
Reference Types (C# 8)” on page 191 in Chapter 4. 


Storage overhead 


Value-type instances occupy precisely the memory required to store their fields. In 
this example, Point takes eight bytes of memory: 


struct Point 


{ 
int x; // 4 bytes 
int y; // 4 bytes 
} 


Technically, the CLR positions fields within the type at an 
address that’s a multiple of the fields’ size (up to a maximum 
of eight bytes). Thus, the following actually consumes 16 bytes 
of memory (with the seven bytes following the first field 
“wasted”): 


struct A { byte b; long 1; } 


You can override this behavior by applying the StructLayout 
attribute (see “Mapping a Struct to Unmanaged Memory” on 
page 984 in Chapter 25). 
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Reference types require separate allocations of memory for the reference and object. 
The object consumes as many bytes as its fields, plus additional administrative 
overhead. The precise overhead is intrinsically private to the implementation of 
the .NET runtime, but at minimum, the overhead is eight bytes, used to store a key 
to the object’s type as well as temporary information such as its lock state for multi- 
threading and a flag to indicate whether it has been fixed from movement by the 
garbage collector. Each reference to an object requires an extra four or eight bytes, 
depending on whether the .NET runtime is running on a 32- or 64-bit platform. 


Predefined Type Taxonomy 
The predefined types in C# are as follows: 


Value types 
e Numeric 
— Signed integer (sbyte, short, int, long) 


— Unsigned integer (byte, ushort, uint, ulong) 


— Real number (float, double, decimal) 


e Logical (bool) 


e Character (char) 


Reference types 
¢ String (string) 


e Object (object) 


Predefined types in C# alias NET Core types in the System namespace. There is 
only a syntactic difference between these two statements: 

int i = 5; 

System.Int32 i = 5; 
The set of predefined value types excluding decimal are known as primitive types in 
the CLR. Primitive types are so called because they are supported directly via 


instructions in compiled code, and this usually translates to direct support on the 
underlying processor; for example: 


// Underlying hexadecimal representation 


int i = 7; // 9x7 
bool b = true; // 9x1 
char c = 'A'$ // 0x41 
float f = 0.5f; // uses IEEE floating-point encoding 


The System. IntPtr and System.UIntPtr types are also primitive (see Chapter 25). 
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Numeric Types 


C# has the predefined numeric types shown in Table 2-1. 


Table 2-1. Predefined numeric types in C# 


CG type System type Suffix Size Range 
Integral—signed 

sbyte SByte 8bits 2 to 27-1 
short Int16 16 bits = -2" to 2%-1 

int Int32 32bits —23" to 23-1 
long Int64 L 64 bits —2 to 26-1 
Integral—unsigned 

byte Byte 8bits 0 to 28-1 

ushort UInt16 16 bits 0 to 2"-1 

uint UInt32 U 32bits 0 to 22-1 

ulong UInt64 UL = 64 bits 0 to 264-1 

Real 

float Single F 32 bits +(~10-4 to 1038) 
double Double D 64 bits + (~10-324 to 1038) 
decimal Decimal M 128 bits + (~10-8 to 1028) 








Of the integral types, int and long are first-class citizens and are favored by both C# 
and the runtime. The other integral types are typically used for interoperability or 
when space efficiency is paramount. 


Of the real number types, float and double are called floating-point types’ and are 
typically used for scientific and graphical calculations. The decimal type is typically 
used for financial calculations, for which base-10-accurate arithmetic and high pre- 
cision are required. 


Numeric Literals 


Integral-type literals can use decimal or hexadecimal notation; hexadecimal is deno- 
ted with the 0x prefix; for example: 


Wit x= 2275 
long y = Ox7F; 





2 Technically, decimal is a floating-point type, too, although it’s not referred to as such in the C# 
language specification. 
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From C# 7, you can insert an underscore anywhere within a numeric literal to make 
it more readable: 


int million = 1_000_000; 
C# 7 and above also lets you specify numbers in binary with the 0b prefix: 
var b = 0b1010_1011_1100_1101_1110 1111; 


Real literals can use decimal and/or exponential notation: 


o 
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double d = 1.5; 
double million = 1E06; 





Numeric literal type inference 
By default, the compiler infers a numeric literal to be either a double or an integral 
type: 
e If the literal contains a decimal point or the exponential symbol (E), it is 
a double. 
¢ Otherwise, the literal’s type is the first type in this list that can fit the literal’s 
value: int, uint, Long, and ulong. 
For example: 


Console.WriteLine ( 1.0.GetType()); // Double (double) 
Console.WriteLine ( 1£06.GetType()); // Double (double) 
Console.WriteLine ( 1.GetType()); // Int32 (int) 
Console.WriteLine ( 0xFQ000000.GetType()); // UInt32 (uint) 
Console.WriteLine (0x100000000.GetType()); // Int64 (long) 


Numeric suffixes 


Numeric suffixes explicitly define the type of a literal. Suffixes can be either lower- 
case or uppercase, and are as follows: 


Category C#type Example 


F float float f = 1.0F; 
D double double d = 1D; 

M decimal decimal d = 1.0M; 
U uint uint i = 1U; 

L long long i = 1L; 


UL ulong ulong i = 1UL; 


The suffixes U and L are rarely necessary because the uint, long, and ulong types 
can nearly always be either inferred or implicitly converted from int: 


long i = 5; // Implicit lossless conversion from int literal to long 
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The D suffix is technically redundant in that all literals with a decimal point are 
inferred to be double. And you can always add a decimal point to a numeric literal: 


double x = 4.0; 


The F and M suffixes are the most useful and should always be applied when specify- 
ing float or decimal literals. Without the F suffix, the following line would not 
compile, because 4.5 would be inferred to be of type double, which has no implicit 
conversion to float: 


float f = 4.5F; 
The same principle is true for a decimal literal: 
decimal d = -1.23M; // Will not compile without the M suffix. 


We describe the semantics of numeric conversions in detail in the following section. 
Numeric Conversions 


Converting between integral types 


Integral type conversions are implicit when the destination type can represent every 
possible value of the source type. Otherwise, an explicit conversion is required; for 
example: 


int x = 12345; // int is a 32-bit integer 
long y = x; // Implicit conversion to 64-bit integral type 
short z = (short)x; // Explicit conversion to 16-bit integral type 


Converting between floating-point types 


A float can be implicitly converted to a double given that a double can represent 
every possible value of a float. The reverse conversion must be explicit. 


Converting between floating-point and integral types 
All integral types can be implicitly converted to all floating-point types: 


int i = 1; 
float f = i; 


The reverse conversion must be explicit: 
int 12 = Cint)fs 


When you cast from a floating-point number to an integral 
type, any fractional portion is truncated; no rounding is per- 
formed. The static class System.Convert provides methods 
that round while converting between various numeric types 
(see Chapter 6). 


Implicitly converting a large integral type to a floating-point type preserves magni- 
tude but can occasionally lose precision. This is because floating-point types always 
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have more magnitude than integral types, but can have less precision. Rewriting our 
example with a larger number demonstrates this: 


int i1 = 100000001; 


float f = i1; // Magnitude preserved, precision lost 
int i2 = (int)f; // 100000000 
Decimal conversions 


All integral types can be implicitly converted to the decimal type given that a deci- 
mal can represent every possible C# integral-type value. All other numeric conver- 
sions to and from a decimal type must be explicit because they introduce the 
possibility of either a value being out of range or precision being lost. 


Arithmetic Operators 


The arithmetic operators (+, -, *, /, %) are defined for all numeric types except the 8- 
and 16-bit integral types: 


+ Addition 


- Subtraction 

* Multiplication 

/ Division 

% Remainder after division 


Increment and Decrement Operators 


The increment and decrement operators (++, --, respectively) increment and decre- 
ment numeric types by 1. The operator can either follow or precede the variable, 
depending on whether you want its value before or after the increment/decrement; 
for example: 

int x = 0, y = Q; 

Console.WriteLine (x++); // Outputs 0; x is now 1 

Console.WriteLine (++y); // Outputs 1; y is now 1 


Specialized Operations on Integral Types 

The integral types are int, uint, Long, ulong, short, ushort, byte, and sbyte. 
Division 

Division operations on integral types always truncate remainders (round toward 


zero). Dividing by a variable whose value is zero generates a runtime error (a 
DivideByZeroException): 


inta=2 / 3; // 9 
int b = 0; 
intc=5/b; // throws DivideByZeroException 


Dividing by the literal or constant 0 generates a compile-time error. 
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Overflow 


At runtime, arithmetic operations on integral types can overflow. By default, this 
happens silently—no exception is thrown, and the result exhibits “wraparound” 
behavior, as though the computation were done on a larger integer type and the 
extra significant bits discarded. For example, decrementing the minimum possible 
int value results in the maximum possible int value: 


int a = int.MinValue; 
a--3 
Console.WriteLine (a == int.MaxValue); // True 


Overflow check operators 


The checked operator instructs the runtime to generate an OverflowException 
rather than overflowing silently when an integral-type expression or statement 
exceeds the arithmetic limits of that type. The checked operator affects expressions 
with the ++, --, +, - (binary and unary), *, /, and explicit conversion operators 
between integral types. Overflow checking incurs a small performance cost. 


The checked operator has no effect on the double and float 
types (which overflow to special “infinite” values, as you'll see 


soon) and no effect on the decimal type (which is always 
checked). 


You can use checked around either an expression or a statement block: 


int a = 1000000; 


int b = 1000000; 
int c = checked (a * b); // Checks just the expression. 
checked // Checks all expressions 
{ // in statement block. 
C= a * 5 
} 


You can make arithmetic overflow checking the default for all expressions in a pro- 
gram by selecting the checked option at the project level (in Visual Studio, go to 
Advanced Build Settings). If you then need to disable overflow checking just for 
specific expressions or statements, you can do so with the unchecked operator. For 
example, the following code will not throw exceptions—even if the project’s checked 
option is selected: 


int x = int.MaxValue; 
int y = unchecked (x + 1); 
unchecked { int z = x + 1; } 
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Overflow checking for constant expressions 


Regardless of the “checked” project setting, expressions evaluated at compile time 
are always overflow-checked—unless you apply the unchecked operator: 


int x 
int y 


int.MaxValue + 1; // Compile-time error 
unchecked (int.MaxValue + 1); // No errors 


Bitwise operators 


C# supports the following bitwise operators: 


Operator Meaning Sample expression Result 


~ Complement ~OxfU Oxf FFFFFFOU 
& And OxfO & 0x33 0x30 

| Or OxfO | 0x33 Oxf3 

a Exclusive Or Oxff00 * OxOfFO OxfOfO 

<< Shift left Qx20 << 2 0x80 

>> Shift right Ox20 >> 1 0x10 





8- and 16-Bit Integral Types 


The 8- and 16-bit integral types are byte, sbyte, short, and ushort. These types 
lack their own arithmetic operators, so C# implicitly converts them to larger types 
as required. This can cause a compile-time error when trying to assign the result 
back to a small integral type: 


short x 
short z 


xX + Y3 // Compile-time error 


In this case, x and y are implicitly converted to int so that the addition can be per- 
formed. This means that the result is also an int, which cannot be implicitly cast 
back to a short (because it could cause loss of data). To make this compile, we must 
add an explicit cast: 


short z = (short) (x + y);  // OK 


Special Float and Double Values 


Unlike integral types, floating-point types have values that certain operations treat 
specially. These special values are NaN (Not a Number), +00, —co, and —0. The float 
and double classes have constants for NaN, +o, and —co, as well as other values 
(MaxVaLue, MinValue, and Epsilon); for example: 


Console.WriteLine (double.NegativeInfinity); // -Infinity 


The constants that represent special values for double and float are as follows: 
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Special value Double constant Float constant 

NaN double.NaN float .NaN 

foo double.PositivelInfinity float.PositiveInfinity 

—oo double.NegativeiInfinity float.NegativeInfinity 

-0 -0.0 -0.0f 

Dividing a nonzero number by zero results in an infinite value: 
Console.WriteLine (1.0 / 0.0); // Infinity 
Console.WriteLine (-1.0 / 0.0); // -Infinity 
Console.WriteLine ( 1.0 / -0.0); // -Infinity 
Console.WriteLine (-1.0 / -0.0); // Infinity 


Dividing zero by zero, or subtracting infinity from infinity, results in a NaN: 


Console.WriteLine ( 0.0 / 0.0); // NaN 

Console.WriteLine ((1.0 / 0.0) - (1.0 / 0.0)); // NaN 
When using ==, a NaN value is never equal to another value, even another NaN 
value: 


Console.WriteLine (0.0 / 0.0 == double.NaN); // False 


To test whether a value is NaN, you must use the float. IsNaN or double. IsNaN 
method: 


Console.WriteLine (double.IsNaN (0.0 / 0.0)); // True 
When using object. Equals, however, two NaN values are equal: 
Console.WriteLine (object.Equals (0.0 / 0.0, double.NaN)); // True 


NaNs are sometimes useful in representing special values. In 
Windows Presentation Foundation (WPF), double.NaN repre- 
sents a measurement whose value is “Automatic”. Another way 
to represent such a value is with a nullable type (Chapter 4); 
another is with a custom struct that wraps a numeric type and 


adds an additional field (Chapter 3). 


float and double follow the specification of the IEEE 754 format types, supported 
natively by almost all processors. You can find detailed information on the behavior 
of these types on the IEEE website. 


double Versus decimal 


double is useful for scientific computations (such as computing spatial coordinates). 
decimal is useful for financial computations and values that are man-made rather 
than the result of real-world measurements. Here’s a summary of the differences. 
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Category double decimal 





Internal representation Base 2 Base 10 

Decimal precision 15-16 significant figures 28-29 significant figures 

Range +(~10-34 to ~ 10308) +(~10-8 to ~1028) 

Special values +0,—0,+00,—00o, and NaN None 

Speed Native to processor Non-native to processor (about 10 times slower than 
double) 





Real-Number Rounding Errors 


float and double internally represent numbers in base 2. For this reason, only 
numbers expressible in base 2 are represented precisely. Practically, this means most 
literals with a fractional component (which are in base 10) will not be represented 
precisely; for example: 


float tenth = 0.1f; // Not quite 0.1 
float one = 1f; 
Console.WriteLine (one - tenth * 10f); // -1.490116E-08 


This is why float and double are bad for financial calculations. In contrast, 
decimal works in base 10 and so can precisely represent numbers expressible in 
base 10 (as well as its factors, base 2 and base 5). Because real literals are in base 10, 
decimal can precisely represent numbers such as 0.1. However, neither double nor 
decimal can precisely represent a fractional number whose base 10 representation is 
recurring: 


decimal m 
double d = 1.0 / 6.0; 


1M / 6M; // 0.1666666666666666666666666667M 
// 0.16666666666666666 


This leads to accumulated rounding errors: 


decimal notQuiteWholeM = m+mtm+m+m+m; // 1.0000000000000000000000000002M 
double notQuiteWholeD = d+d+d+d+d+d; // 0.99999999999999989 


which breaks equality and comparison operations: 


Console.WriteLine (notQuiteWholeM == 1M); // False 
Console.WriteLine (notQuiteWholeD < 1.0); // True 


Boolean Type and Operators 


C#’s bool type (aliasing the System.Boolean type) is a logical value that can be 
assigned the literal true or false. 


Although a Boolean value requires only one bit of storage, the runtime will use one 
byte of memory because this is the minimum chunk that the runtime and processor 
can efficiently work with. To avoid space inefficiency in the case of arrays, .NET 
provides a BitArray class in the System.Collections namespace that is designed to 
use just one bit per Boolean value. 





Boolean Type and Operators | 43 


ow 
9 
a 
is) 
2) 


e6enbue 


#5 





bool Conversions 


No casting conversions can be made from the bool type to numeric types, or vice 
versa. 


Equality and Comparison Operators 


== and != test for equality and inequality of any type but always return a bool 
value.? Value types typically have a very simple notion of equality: 


int x = 1; 
int y = 23 
int z = 1; 
Console.WriteLine (x == y); // False 
Console.WriteLine (x == z); // True 


For reference types, equality, by default, is based on reference, as opposed to the 
actual value of the underlying object (more on this in Chapter 6): 


public class Dude 


{ 

public string Name; 

public Dude (string n) { Name = n; } 
} 


Dude d1 = new Dude ("John"); 
Dude d2 = new Dude ("John"); 


Console.WriteLine (d1 == d2); // False 
Dude d3 = d1; 
Console.WriteLine (d1 == d3); // True 
The equality and comparison operators, ==, !=, <, >, >=, and <=, work for all numeric 


types, but you should use them with caution with real numbers (as we saw in “Real- 
Number Rounding Errors” on page 43). The comparison operators also work on 
enum type members by comparing their underlying integral-type values. We 
describe this in “Enums” on page 131 in Chapter 3. 


We explain the equality and comparison operators in greater detail in “Operator 
Overloading” on page 216 in Chapter 4, and in “Equality Comparison” on page 296 
and “Order Comparison” on page 306 in Chapter 6. 


Conditional Operators 


The && and || operators test for and and or conditions. They are frequently used in 
conjunction with the ! operator, which expresses not. In the following example, the 
UseUmbrella method returns true if it’s rainy or sunny (to protect us from the rain 
or the sun), as long as it’s not also windy (umbrellas are useless in the wind): 





3 Its possible to overload these operators (Chapter 4) such that they return a non-bool type, but 
this is almost never done in practice. 
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static bool UseUmbrella (bool rainy, bool sunny, bool windy) 


{ 


return !windy && (rainy || sunny); 
} 
The && and || operators short-circuit evaluation when possible. In the preceding 
example, if it is windy, the expression (rainy || sunny) is not even evaluated. 


Short-circuiting is essential in allowing expressions such as the following to run 
without throwing a NullReferenceException: 


if (sb != null && sb.Length > 0) ... 
The & and | operators also test for and and or conditions: 
return !windy & (rainy | sunny); 


The difference is that they do not short-circuit. For this reason, they are rarely used 
in place of conditional operators. 


Unlike in C and C++, the & and | operators perform (non- 
short-circuiting) Boolean comparisons when applied to bool 
expressions. The & and | operators perform bitwise operations 
only when applied to numbers. 


Conditional operator (ternary operator) 


The conditional operator (more commonly called the ternary operator because it’s 
the only operator that takes three operands) has the form q ? a : b; thus, if condi- 
tion q is true, a is evaluated, else b is evaluated: 


static int Max (int a, int b) 


{ 


return (a > b) ? a: b; 


} 


The conditional operator is particularly useful in LINQ expressions (Chapter 8). 


Strings and Characters 


C#’s char type (aliasing the System.Char type) represents a Unicode character and 
occupies 2 bytes (UTF-16). A char literal is specified within single quotes: 


char c = 'A'; // Simple character 


Escape sequences express characters that cannot be expressed or interpreted literally. 
An escape sequence is a backslash followed by a character with a special meaning; 
for example: 


char newLine = '\n'; 
char backSlash = '\\'; 


Table 2-2 shows the escape sequence characters. 
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Table 2-2. Escape sequence characters 


Char Meaning Value 
\' Single quote  @x0027 
\" Double quote x0022 


\\ Backslash 0x005C 
\o Null 0x0000 
\a_ Alert @x0007 


\b Backspace 0x0008 
\f — Form feed 0x000C 
\n — Newline 0x000A 
\r Carriage return ©x000D 
\t — Horizontal tab ©x0009 
\v__ Vertical tab 0x000B 





The \u (or \x) escape sequence lets you specify any Unicode character via its four- 
digit hexadecimal code: 


char copyrightSymbol = '\uQ0A9'; 
char omegaSymbol = '\u03A9'; 
char newLine = '\u000A'; 


char Conversions 


An implicit conversion from a char to a numeric type works for the numeric types 
that can accommodate an unsigned short. For other numeric types, an explicit con- 
version is required. 


String Type 


C#’s string type (aliasing the System.String type, covered in depth in Chapter 6) 
represents an immutable (unmodifiable) sequence of Unicode characters. A string 
literal is specified within double quotes: 


string a = "Heat"; 


string is a reference type rather than a value type. Its equality 
operators, however, follow value-type semantics: 


string a = "test"; 
string b = "test"; 
Console.Write (a == b); // True 


The escape sequences that are valid for char literals also work inside strings: 
string a = "Here's a tab:\t"; 


The cost of this is that whenever you need a literal backslash, you must write it 
twice: 
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string a1 = "\\\\server\\fileshare\\helloworld.cs"; 


To avoid this problem, C# allows verbatim string literals. A verbatim string literal is 
prefixed with @ and does not support escape sequences. The following verbatim 
string is identical to the preceding one: 


string a2 = @"\\server\fileshare\helloworld.cs"; 
A verbatim string literal can also span multiple lines: 


string escaped = "First Line\r\nSecond Line"; 
string verbatim = @"First Line 
Second Line"; 


// True if your text editor uses CR-LF line separators: 
Console.WriteLine (escaped == verbatim); 


You can include the double-quote character in a verbatim literal by writing it twice: 


string xml = @"<customer id=""123""></customer>"; 


String concatenation 
The + operator concatenates two strings: 
string s = "a" + "b"; 
One of the operands might be a nonstring value, in which case ToString is called on 


that value: 


string s = "a" +5; // a5 


Using the + operator repeatedly to build up a string is inefficient: a better solution is 
to use the System. Text. StringBuilder type (described in Chapter 6). 


String interpolation 


A string preceded with the $ character is called an interpolated string. Interpolated 
strings can include expressions enclosed in braces: 

int x = 4; 

Console.Write ($"A square has {x} sides"); // Prints: A square has 4 sides 
Any valid C# expression of any type can appear within the braces, and C# will con- 
vert the expression to a string by calling its ToString method or equivalent. You can 
change the formatting by appending the expression with a colon and a format string 


(format strings are described in “string.Format and composite format strings” on 
page 248 in Chapter 6): 


string s = $"255 in hex is {byte.MaxValue:X2}"; // X2 = 2-digit hexadecimal 
// Evaluates to "255 in hex is FF" 


Interpolated strings must complete on a single line, unless you also specify the ver- 
batim string operator: 
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int x = 2; 

// Note that $ must appear before @ prior to C# 8: 
string s = $@"this spans { 

x} lines"; 


To include a brace literal in an interpolated string, repeat the desired brace 
character. 


String comparisons 


string does not support < and > operators for comparisons. You must use the 
string’s CompareTo method, described in Chapter 6. 


Arrays 


An array represents a fixed number of variables (called elements) of a particular 
type. The elements in an array are always stored in a contiguous block of memory, 
providing highly efficient access. 


An array is denoted with square brackets after the element type: 
char[] vowels = new char[5]; // Declare an array of 5 characters 


Square brackets also index the array, accessing a particular element by position: 


vowels[0] = 'a'; 
vowels[1] = 'e'; 
vowels[2] = 'i'; 
vowels[3] = 'o'; 
vowels[4] = 'u'; 
Console.WriteLine (vowels[1]); //e 


«<> 


This prints “e” because array indexes start at 0. We can use a for loop statement to 
iterate through each element in the array. The for loop in this example cycles the 
integer i from 0 to 4: 


for (int i = 0; i < vowels.Length; i++) 
Console.Write (vowels[i]); // aeiou 


The Length property of an array returns the number of elements in the array. After 
an array has been created, you cannot change its length. The System.Collection 
namespace and subnamespaces provide higher-level data structures, such as 
dynamically sized arrays and dictionaries. 


An array initialization expression lets you declare and populate an array in a single 
step: 


char[] vowels = new char[] {'a','e','t','o','u'}; 
or simply: 


char[] vowels = {'a','e','i','o','u'}; 
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All arrays inherit from the System.Array class, providing common services for all 
arrays. These members include methods to get and set elements regardless of the 
array type. We describe them in “The Array Class” on page 327 in Chapter 7. 


Default Element Initialization 


Creating an array always preinitializes the elements with default values. The default 
value for a type is the result of a bitwise zeroing of memory. For example, consider 
creating an array of integers. Because int is a value type, this allocates 1,000 integers 
in one contiguous block of memory. The default value for each element will be 0: 


int[] a = new int[1000]; 
Console.Write (a[123]); // 9 
Value types versus reference types 


Whether an array element type is a value type or a reference type has important per- 
formance implications. When the element type is a value type, each element value is 
allocated as part of the array, as shown here: 


public struct Point { public int X, Y; } 


Point[] a = new Point[1000]; 
int x = a[500].X; // 9 


Had Point been a class, creating the array would have merely allocated 1,000 null 
references: 


public class Point { public int X, Y; } 


Point[] a = new Point[1000]; 
int x = a[500].X; // Runtime error, NullReferenceException 


To avoid this error, we must explicitly instantiate 1,000 Points after instantiating the 
array: 


Point[] a = new Point[1000]; 
for (int i = 0; i < a.Length; i++) // Iterate i from © to 999 
ali] = new Point(); // Set array element i with new point 


An array itself is always a reference-type object, regardless of the element type. For 
instance, the following is legal: 


int[] a = null; 


Indices and Ranges (C# 8) 


C# 8 introduces indices and ranges to simplify working with elements or portions of 
an array. 
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Indices and ranges also work with the CLR types Span<T> and 
ReadOnlySpan<T> (see “Span<T> and Memory<T>” on page 
239 in Chapter 5). 

You can also make your own types work with indices and 
ranges, by defining an indexer of type Index or Range (see 
“Indexers” on page 102 in Chapter 3). 


Indices 


Indices let you refer to elements relative to the end of an array, with the * operator. 
“1 refers to the last element, “2 refers to the second-to-last element, and so on: 
char[] vowels = new char[] {'a','e','t','o','u'}; 
char lastElement = vowels [*1]; // ‘u' 


char secondToLast = vowels [42]; // 'o' 


(“0 equals the length of the array, so vowels[*0] generates an error.) 


C# implements indices with the help of the Index type, so you can also do the 
following: 


Index first = 0; 
Index last = “1; 
char firstElement = vowels [first]; // ‘a' 
char lastElement = vowels [last]; // ‘u' 


Ranges 
Ranges let you “slice” an array by using the .. operator: 


char[] firstTwo = vowels [..2]; // ‘a', ‘e' 
char[] lastThree = vowels [2..]; Lf, T0853 ul 
char[] middleOne = vowels [2..3]; [hw 


The second number in the range is exclusive, so ..2 returns the elements before 
vowels[2]. 


You can also use the * symbol in ranges. The following returns the last two 
characters: 


char[] lastTwo = vowels [42..]; Lf og tue 


C# implements ranges with the help of the Range type, so you can also do the 
following: 


Range firstTwoRange = 0..2; 
char[] firstTwo = vowels [firstTwoRange]; // 'a', 'e' 


Multidimensional Arrays 


Multidimensional arrays come in two varieties: rectangular and jagged. Rectangular 
arrays represent an n-dimensional block of memory, and jagged arrays are arrays of 
arrays. 
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Rectangular arrays 


Rectangular arrays are declared using commas to separate each dimension. The fol- 
lowing declares a rectangular two-dimensional array for which the dimensions are 3 
by 3: 


int[,] matrix = new int[3,3]; 


The GetLength method of an array returns the length for a given dimension (start- 
ing at 0): 


for (int i = 0; i < matrix.GetLength(0); i++) 
for (int j = 0; j < matrix.GetLength(1); j++) 
matrix[i,j] = i * 3+ 3; 
You can initialize a rectangular array with explicit values. The following code creates 
an array identical to the previous example: 


int[,] matrix = new int[,] 
{ 

{0,1,2}, 

{3,4,5}, 

{6,7,8} 
}; 


Jagged arrays 


Jagged arrays are declared using successive square brackets to represent each 
dimension. Here is an example of declaring a jagged two-dimensional array for 
which the outermost dimension is 3: 


int{][] matrix = new int[3][]; 


Interestingly, this is new int[3][] and not new int[][3]. 
Eric Lippert has written an excellent article on why this is so. 


The inner dimensions aren't specified in the declaration because, unlike a rectangu- 
lar array, each inner array can be an arbitrary length. Each inner array is implicitly 
initialized to null rather than an empty array. You must manually create each inner 
array: 


for (int i = 0; i < matrix.Length; i++) 
ie 
matrix[i] = new int[3]; // Create inner array 
for (int j = 0; j < matrix[i].Length; j++) 
matrix[i][j] =i * 3 + j; 


You can initialize a jagged array with explicit values. The following code creates an 
array identical to the previous example with an additional element at the end: 


int{][] matrix = new int[][] 


new int[] {0,1,2}, 





Arrays | 51 


ow 
9 
4 
Q 
1) 


e6en6bue 


E30) 





new int[] {3,4,5}, 
new int[] {6,7,8,9} 
33 


Simplified Array Initialization Expressions 


There are two ways to shorten array initialization expressions. The first is to omit 
the new operator and type qualifications: 


char[] vowels = {'a','e','i','o','u'}; 


int[,] rectangularMatrix = 
{ 

{0,1,2}, 

{3,4,5}, 

{6,7,8} 
}; 


int[][] jaggedMatrix = 
{ 
new int[] {0,1,2}, 
new int[] {3,4,5}, 
new int[] {6,7,8,9} 
33 


The second approach is to use the var keyword, which instructs the compiler to 
implicitly type a local variable: 


var i = 3; // < is implicitly of type int 
var s = "Sausage"; // s is implicitly of type string 


// Therefore: 


var rectMatrix = new int[, ] // rectMatrix is implicitly of type int[,] 
{ 

{0,1,2}, 

{3,4,5}, 

{6,7,8} 
33 


var jaggedMat = new int[][] // jaggedMat is implicitly of type int[][] 
{ 
new int[] {0,1,2}, 
new int[] {3,4,5}, 
new int[] {6,7,8,9} 
35 
Implicit typing can be taken one stage further with arrays: you can omit the type 
qualifier after the new keyword and have the compiler infer the array type: 


var vowels = new[] {'a','e','t','o','u'};  // Compiler infers char[] 


For this to work, the elements must all be implicitly convertible to a single type (and 
at least one of the elements must be of that type, and there must be exactly one best 
type), as in the following example: 
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var x = new[] {1,10000000000}; // all convertible to long 


Bounds Checking 


All array indexing is bounds-checked by the runtime. An IndexOutOfRange 
Exception is thrown if you use an invalid index: 


int[] arr = new int[3]; 
arr[3] = 1; // IndexOutOfRangeException thrown 


Array bounds checking is necessary for type safety and simplifies debugging. 


Generally, the performance hit from bounds checking is 
minor, and the Just-In-Time (JIT) compiler can perform opti- 
mizations, such as determining in advance whether all indexes 
will be safe before entering a loop, thus avoiding a check on 
each iteration. In addition, C# provides “unsafe” code that can 
explicitly bypass bounds checking (see “Unsafe Code and 
Pointers” on page 219 in Chapter 4). 


Variables and Parameters 


A variable represents a storage location that has a modifiable value. A variable can 
be a local variable, parameter (value, ref, out, or in), field (instance or static), or array 
element. 


The Stack and the Heap 


The stack and the heap are the places where variables reside. Each has very different 
lifetime semantics. 


Stack 


The stack is a block of memory for storing local variables and parameters. The stack 
logically grows and shrinks as a method or function is entered and exited. Consider 
the following method (to avoid distraction, input argument checking is ignored): 


static int Factorial (int x) 


{ 
if (x == 0) return 1; 
return x * Factorial (x-1); 


} 


This method is recursive, meaning that it calls itself. Each time the method is 
entered, a new int is allocated on the stack, and each time the method exits, the int 
is deallocated. 


Heap 


The heap is the memory in which objects (i.e., reference-type instances) reside. 
Whenever a new object is created, it is allocated on the heap, and a reference to that 
object is returned. During a program’s execution, the heap begins filling up as 
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new objects are created. The runtime has a garbage collector that periodically deal- 
locates objects from the heap, so your program does not run out of memory. An 
object is eligible for deallocation as soon as it’s not referenced by anything that’s 
itself alive. 


In the following example, we begin by creating a StringBuilder object referenced 
by the variable refi and then write out its content. That StringBuilder object is 
then immediately eligible for garbage collection because nothing subsequently uses 
it. 

Then, we create another StringBuilder referenced by variable ref2 and copy that 
reference to ref3. Even though ref2 is not used after that point, ref3 keeps the 
same StringBuilder object alive—ensuring that it doesn't become eligible for col- 
lection until we've finished using ref3. 


using System; 
using System.Text; 


class Test 


{ 


static void Main() 


{ 
StringBuilder refi = new StringBuilder ("object1i"); 
Console.WriteLine (ref1); 
// The StringBuilder referenced by ref1 is now eligible for GC. 


StringBuilder ref2 = new StringBuilder ("object2"); 
StringBuilder ref3 = ref2; 
// The StringBuilder referenced by ref2 is NOT yet eligible for GC. 


Console.WriteLine (ref3); // object2 
} 
} 


Value-type instances (and object references) live wherever the variable was declared. 


If the instance was declared as a field within a class type, or as an array element, that 
instance lives on the heap. 


You cant explicitly delete objects in C#, as you can in C++. An 
unreferenced object is eventually collected by the garbage 
collector. 


The heap also stores static fields. Unlike objects allocated on the heap (which can be 
garbage-collected), these live until the application domain is torn down. 


Definite Assignment 


C# enforces a definite assignment policy. In practice, this means that outside of an 
unsafe context, it’s impossible to access uninitialized memory. Definite assignment 
has three implications: 
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¢ Local variables must be assigned a value before they can be read. 


¢ Function arguments must be supplied when a method is called (unless marked 
as optional; see “Optional parameters” on page 61). 


All other variables (such as fields and array elements) are automatically initial- 
ized by the runtime. 





= 
o 
7 3 a ‘ oO 
For example, the following code results in a compile-time error: 9 aa 
Ze # 
ay 
static void Main() a“ Q 
{ 
int x; 
Console.WriteLine (x); // Compile-time error 
} 


Fields and array elements are automatically initialized with the default values for 
their type. The following code outputs 0 because array elements are implicitly 
assigned to their default values: 


static void Main() 


{ 
int[] ints = new int[2]; 
Console.WriteLine (ints[0]); // 9 
} 


The following code outputs 0, because fields are implicitly assigned a default value: 


class Test 


{ 
static int x; 
static void Main() { Console.WriteLine (x); } // 0 


} 
Default Values 


All type instances have a default value. The default value for the predefined types is 
the result of a bitwise zeroing of memory: 


Type Default value 


All reference types null 
All numeric and enum types © 
char type ‘\o' 
bool type false 





You can obtain the default value for any type via the default keyword: 
Console.WriteLine (default (decimal)); // 9 
From C# 7.1, you can optionally omit the type when it can be inferred: 


decimal d = default; 
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The default value in a custom value type (i-e., struct) is the same as the default 
value for each field defined by the custom type. 
Parameters 


A method may have a sequence of parameters. Parameters define the set of argu- 
ments that must be provided for that method. In the following example, the method 
Foo has a single parameter named p, of type int: 


static void Foo (int p) 


{ 
p=p+t+1; // Increment p by 1 
Console.WriteLine (p); // Write p to screen 
} 
static void Main() 
{ 
Foo (8); // Call Foo with an argument of 8 
} 


You can control how parameters are passed with the ref, in, and out modifiers: 





Parameter modifier Passed by Variable must be definitely assigned 
(None) Value Going in 

ref Reference Going in 

in Reference (read-only) Going in 

out Reference Going out 





Passing arguments by value 


By default, arguments in C# are passed by value, which is by far the most common 
case. This means that a copy of the value is created when passed to the method: 


class Test 
{ 
static void Foo (int p) 
: p=p+t+ 1; // Increment p by 1 
Console.WriteLine (p); // Write p to screen 
J 
static void Main() 
{ 
int x = 8; 
Foo (x); // Make a copy of x 
Console.WriteLine (x); // x will still be 8 
} 


} 
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Assigning p a new value does not change the contents of x, because p and x reside in 
different memory locations. 


Passing a reference-type argument by value copies the reference, but not the object. 
In the following example, Foo sees the same StringBuilder object that Main instan- 
tiated, but has an independent reference to it. In other words, sb and fooSB are sepa- 
rate variables that reference the same StringBuilder object: 





r 
w 5 
class Test 2a9 
{ $B E 
static void Foo (StringBuilder fooSB) ry 
{ 


fooSB.Append ("test"); 
fooSB = null; 


} 
static void Main() 
{ 
StringBuilder sb = new StringBuilder(); 
Foo (sb); 
Console.WriteLine (sb.ToString()); // test 
} 
} 


Because fooSB is a copy of a reference, setting it to null doesn’t make sb null. (If, 
however, fooSB was declared and called with the ref modifier, sb would become 


null.) 
The ref modifier 


To pass by reference, C# provides the ref parameter modifier. In the following 
example, p and x refer to the same memory locations: 


class Test 
L 
static void Foo (ref int p) 
; p=p+t+ 1; // Increment p by 1 
Console.WriteLine (p);  // Write p to screen 
} 
static void Main() 
{ 
int: x= 18's 
Foo (ref x); // Ask Foo to deal directly with x 
Console.WriteLine (x); // x is now 9 
} 
} 
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Now assigning p a new value changes the contents of x. Notice how the ref modifier 
is required both when writing and when calling the method.* This makes it very 
clear what's going on. 


The ref modifier is essential in implementing a swap method (in “Generics” on 
page 135 in Chapter 3, we show how to write a swap method that works with any 


type): 


class Test 
{ 
static void Swap (ref string a, ref string b) 
{ 
string temp = a; 
a=b; 
b = temp; 
} 
static void Main() 
{ 
string x = "Penn"; 


string y = "Teller"; 

Swap (ref x, ref y); 
Console.WriteLine (x); // Teller 
Console.WriteLine (y);  // Penn 


} 
} 
A parameter can be passed by reference or by value, regardless 
of whether the parameter type is a reference type or a value 
type. 
The out modifier 


An out argument is like a ref argument except for the following: 


¢ It need not be assigned before going into the function. 


e It must be assigned before it comes out of the function. 


The out modifier is most commonly used to get multiple return values back from a 
method; for example: 


class Test 
{ 
static void Split (string name, out string firstNames, 
out string LastName) 
{ 
int i = name.LastIndexOf (' '); 
firstNames = name.Substring (0, i); 





4 An exception to this rule is when calling Component Object Model (COM) methods. We discuss 
this in Chapter 24. 
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lastName = name.Substring (i + 1); 


} 


static void Main() 

{ 
string a, b; 
Split ("Stevie Ray Vaughan", out a, out b); 
Console.WriteLine (a); // Stevie Ray 
Console.WriteLine (b); // Vaughan 

} 

} 


Like a ref parameter, an out parameter is passed by reference. 


Out variables and discards 


From C# 7, you can declare variables on the fly when calling methods with out 
parameters. We can shorten the Main method in our preceding example as follows: 


static void Main() 


{ 
Split ("Stevie Ray Vaughan", out string a, out string b); 
Console.WriteLine (a); // Stevie Ray 
Console.WriteLine (b); // Vaughan 

} 


When calling methods with multiple out parameters, sometimes youre not interes- 
ted in receiving values from all the parameters. In such cases, you can discard the 
ones in which youre not interested by using an underscore: 


Split ("Stevie Ray Vaughan", out string a, out _); // Discard the 2nd param 
Console.WriteLine (a); 


In this case, the compiler treats the underscore as a special symbol, called a discard. 
You can include multiple discards in a single call. Assuming SomeBigMethod has 
been defined with seven out parameters, we can ignore all but the fourth, as follows: 


SomeBigMethod (out _, out _, out _, out int x, out _, out _, out _); 


For backward compatibility, this language feature will not take effect if a real under- 
score variable is in scope: 


string _; 
Split ("Stevie Ray Vaughan", out string a, out _); 
Console.WriteLine (_); // Vaughan 


Implications of passing by reference 


When you pass an argument by reference, you alias the storage location of an exist- 
ing variable rather than create a new storage location. In the following example, the 
variables x and y represent the same instance: 


class Test 


{ 


static int x; 
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static void Main() { Foo (out x); } 


static void Foo (out int y) 


{ 
Console.WriteLine (x); // xis 0 
ye 4d; // Mutate y 
Console.WriteLine (x); // xis 1 
} 
} 
The in modifier 


An in parameter is similar to a ref parameter except that the argument’s value can- 
not modified by the method (doing so generates a compile-time error). This modi- 
fier is most useful when passing a large value type to the method because it allows 
the compiler to avoid the overhead of copying the argument prior to passing it in 
while still protecting the original value from modification. 


Overloading solely on the presence of in is permitted: 


void Foo ( SomeBigStruct a) { ... } 
void Foo (in SomeBigStruct a) { ... } 


To call the second overload, the caller must use the in modifier: 


SomeBigStruct x = ...3 
Foo (x); // Calls the first overload 
Foo (in x); // Calls the second overload 


When there's no ambiguity: 
void Bar (in SomeBigStruct a) { ... } 
use of the in modifier is optional for the caller: 


Bar (x); // OK (calls the 'in' overload) 
Bar (in x); // OK (calls the 'in' overload) 


To make this example meaningful, SomeBigStruct would be defined as a struct (see 
“Structs” on page 120 in Chapter 3). 


The params modifier 


You can specify the params parameter modifier on the last parameter of a method 
so that the method accepts any number of arguments of a particular type. The 
parameter type must be declared as an array, as shown in the following example: 


class Test 
{ 
static int Sum (params int[] ints) 
{ 
int sum = 0; 
for (int i = 0; i < ints.Length; i++) 
sum += ints[i]; // Increase sum by ints[i] 
return sum; 
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} 


static void Main() 


{ 
int total = Sum (1, 2, 3, 4); 
Console.WriteLine (total); // 10 
} 


I 
You can also supply a params argument as an ordinary array. The first line in Main is 
semantically equivalent to this: 


int total = Sum (new int[] { 1, 2, 3, 4} ); 


Optional parameters 


Methods, constructors, and indexers (Chapter 3) can declare optional parameters. A 
parameter is optional if it specifies a default value in its declaration: 


void Foo (int x = 23) { Console.WriteLine (x); } 
You can omit optional parameters when calling the method: 
Foo(); // 23 


The default argument of 23 is actually passed to the optional parameter x—the com- 
piler bakes the value 23 into the compiled code at the calling side. The preceding call 
to Foo is semantically identical to: 


Foo (23); 


because the compiler simply substitutes the default value of an optional parameter 
wherever it is used. 


Adding an optional parameter to a public method that’s called 
from another assembly requires recompilation of both assem- 
blies—just as though the parameter were mandatory. 


The default value of an optional parameter must be specified by a constant expres- 
sion or a parameterless constructor of a value type. Optional parameters cannot be 
marked with ref or out. 


Mandatory parameters must occur before optional parameters in both the method 
declaration and the method call (the exception is with params arguments, which still 
always come last). In the following example, the explicit value of 1 is passed to x, 
and the default value of 0 is passed to y: 


void Foo (int x = 0, int y = 0) { Console.WriteLine (x +", "+ y); } 
void Test() 
{ 
Foo(1);  // 1, ® 
} 
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To do the converse (pass a default value to x and an explicit value to y) you must 
combine optional parameters with named arguments. 


Named arguments 


Rather than identifying an argument by position, you can identify an argument by 
name: 


" " 


void Foo (int x, int y) { Console.WriteLine (x + ", + y); } 


void Test() 


{ 
Foo (x:1, y:2); // 1, 2 
} 


Named arguments can occur in any order. The following calls to Foo are semanti- 
cally identical: 


Foo (x:1, y:2); 
Foo (y:2, x:1); 


A subtle difference is that argument expressions are evaluated 
in the order in which they appear at the calling site. In general, 
this makes a difference only with interdependent side- 
effecting expressions such as the following, which writes 0, 1: 


int a = 0; 
Foo (y: ++#a, x: --a); // ++a is evaluated first 


Of course, you would almost certainly avoid writing such code 
in practice! 


You can mix named and positional arguments: 
Foo (1, y:2); 


However, there is a restriction: positional arguments must come before named 


arguments unless they are used in the correct position. So, we could call Foo like 
this: 


Foo (x:1, 2); // OK. Arguments in the declared positions 
but not like this: 
Foo (y:2, 1); // Compile-time error. y isn't in the first position 


Named arguments are particularly useful in conjunction with optional parameters. 
For instance, consider the following method: 


void Bar (int a = 0, int b= 0, int c =0, intd=0) {... } 
We can call this supplying only a value for d, as follows: 
Bar (d:3); 


This is particularly useful when calling COM APIs, which we discuss in detail in 
Chapter 25. 
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Ref Locals 


C# 7 added an esoteric feature, whereby you can define a local variable that refer- 
ences an element in an array or field in an object: 


int[] numbers = { 0, 1, 2, 3, 4 }; 
ref int numRef = ref numbers [2]; 


In this example, numRef is a reference to numbers[2]. When we modify numRef, we 
modify the array element: 


numRef *= 10; 
Console.WriteLine (numRef); // 20 
Console.WriteLine (numbers [2]); // 20 


The target for a ref local must be an array element, field, or local variable; it cannot 
be a property (Chapter 3). Ref locals are intended for specialized micro-optimization 
scenarios and are typically used in conjunction with ref returns. 


Ref Returns 


The Span<T> and ReadOnlySpan<T> types that we describe in 
Chapter 24 use ref returns to implement a highly efficient 
indexer. Outside such scenarios, ref returns are not commonly 
used; you can consider them a micro-optimization feature. 


You can return a ref local from a method. This is called a ref return: 


static string x = "Old Value"; 
static ref string GetX() => ref x; // This method returns a ref 


static void Main() 


{ 
ref string xRef = ref GetX(); // Assign result to a ref local 
xRef = "New Value"; 
Console.WriteLine (x); // New Value 

} 


If you omit the ref modifier on the calling side, it reverts to returning an ordinary 
value: 


string localX = GetX(); // Legal: localX is an ordinary non-ref variable. 
You also can use ref returns when defining a property or indexer: 

static ref string Prop => ref x; 
Such a property is implicitly writable, despite there being no set accessor: 

Prop = "New Value"; 
You can prevent such modification by using ref readonly: 


static ref readonly string Prop => ref x; 





Variables and Parameters | 63 


ow 
9 
4 
Q 
1) 


e6en6bue 4 


E30) 





The ref readonly modifier prevents modification while still enabling the perfor- 
mance gain of returning by reference. The gain would be very small in this case, 
because x is of type string (a reference type): no matter how long the string, the only 
inefficiency that we can hope to avoid is the copying of a single 32- or 64-bit refer- 
ence. Real gains can occur with custom value types (see “Structs” on page 120 in 
Chapter 3), but only if the struct is marked as readonly (otherwise, the compiler 
will perform a defensive copy). 


Attempting to define an explicit set accessor on a ref return property or indexer is 
illegal. 


var—Implicitly Typed Local Variables 


It is often the case that you declare and initialize a variable in one step. If the com- 
piler is able to infer the type from the initialization expression, you can use the key- 
word var (introduced in C# 3.0) in place of the type declaration; for example: 


var x = "hello"; 
var y = new System.Text.StringBuilder(); 
var z = (float)Math.PI; 


This is precisely equivalent to the following: 


string x = "hello"; 
System.Text.StringBuilder y = new System. Text.StringBuilder(); 
float z = (float)Math.PI; 


Because of this direct equivalence, implicitly typed variables are statically typed. For 
example, the following generates a compile-time error: 


var x = 53 
x = "hello"; // Compile-time error; x is of type int 


var can decrease code readability in the case when you can’t 
deduce the type purely by looking at the variable declaration. 
For example: 


Random r = new Random(); 
var x = r.Next(); 


What type is x? 


In “Anonymous Types” on page 195 in Chapter 4, we will describe a scenario in 
which the use of var is mandatory. 


Expressions and Operators 


An expression essentially denotes a value. The simplest kinds of expressions are con- 
stants and variables. Expressions can be transformed and combined using opera- 
tors. An operator takes one or more input operands to output a new expression. 


Here is an example of a constant expression: 


12 
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We can use the * operator to combine two operands (the literal expressions 12 and 
30), as follows: 


12 * 30 


We can build complex expressions because an operand can itself be an expression, 
such as the operand (12 * 30) in the following example: 


1 + (12 * 30) 


Operators in C# can be classed as unary, binary, or ternary, depending on the num- 
ber of operands they work on (one, two, or three). The binary operators always use 
infix notation, in which the operator is placed between the two operands. 


Primary Expressions 


Primary expressions include expressions composed of operators that are intrinsic to 
the basic plumbing of the language. Here is an example: 


Math.Log (1) 


This expression is composed of two primary expressions. The first expression per- 
forms a member lookup (with the . operator), and the second expression performs 
a method call (with the () operator). 


Void Expressions 
A void expression is an expression that has no value, such as this: 
Console.WriteLine (1) 


Because it has no value, you cannot use a void expression as an operand to build 
more complex expressions: 


1 + Console.WriteLine (1) // Compile-time error 


Assignment Expressions 


An assignment expression uses the = operator to assign the result of another expres- 
sion to a variable; for example: 


x=x*5S 


An assignment expression is not a void expression—it has a value of whatever was 
assigned, and so can be incorporated into another expression. In the following 
example, the expression assigns 2 to x and 10 to y: 


y=5* (x = 2) 
You can use this style of expression to initialize multiple values: 
a=b=c=d=0 


The compound assignment operators are syntactic shortcuts that combine assign- 
ment with another operator: 
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x * 2 
x << 1 


xX *= 2 // equivalent to x 
x <<= 1 // equivalent to x 


(A subtle exception to this rule is with events, which we describe in Chapter 4: the 
+= and -= operators here are treated specially and map to the event’s add and remove 
accessors.) 


Operator Precedence and Associativity 


When an expression contains multiple operators, precedence and associativity deter- 
mine the order of their evaluation. Operators with higher precedence execute before 
operators of lower precedence. If the operators have the same precedence, the oper- 
ator’s associativity determines the order of evaluation. 


Precedence 
The following expression: 
Te 20203 
is evaluated as follows because * has a higher precedence than +: 


1 + (2 * 3) 


Left-associative operators 


Binary operators (except for assignment, lambda, and null-coalescing operators) are 
left-associative; in other words, they are evaluated from left to right. For example, 
the following expression: 


8/4/2 
is evaluated as follows: 


(8/4)/2 ffi 


You can insert parentheses to change the actual order of evaluation: 


8/(4/2) //4 


Right-associative operators 


The assignment operators, as well as the lambda, null coalescing, and conditional 
operators, are right-associative; in other words, they are evaluated from right to left. 
Right associativity allows multiple assignments such as the following to compile: 


xX = y = 3; 


This first assigns 3 to y and then assigns the result of that expression (3) to x. 


Operator Table 


Table 2-3 lists C#’s operators in order of precedence. Operators in the same category 
have the same precedence. We explain user-overloadable operators in “Operator 
Overloading” on page 216 in Chapter 4. 
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Table 2-3. C# operators (categories in order of precedence) 


Category 


Primary 


Unary 


Range 


Multiplicative 


Operator 
symbol 


?. and ?[] 
-> (unsafe) 
QO 
0] 


++ 


new 


stackalloc 


typeof 


nameof 


checked 


unchecked 


default 
await 
sizeof 


+ 


++ 


QO 
* (unsafe) 


& (unsafe) 


* 


Operator name 


Member access 
Null-conditional 
Pointer to struct 
Function call 
Array/index 
Post-increment 
Post-decrement 
Create instance 


Unsafe stack 
allocation 


Get type from 
identifier 


Get name of 
identifier 


Integral overflow 
check on 


Integral overflow 
check off 


Default value 
Await 

Get size of struct 
Positive value of 
Negative value of 
Not 

Bitwise 
complement 
Pre-increment 
Pre-decrement 
Cast 

Value at address 
Address of value 


Start and end of a 
range of indices 


Multiply 
Divide 


Example 


x.y 
x?.yorx?[0] 
x->y 

x() 

a[x] 

X++ 

X= 

new Foo() 


stackalloc(10) 


typeof (int) 


nameof (x) 


checked(x) 


unchecked(x) 


default(char) 
await myTask 
sizeof (int) 
+X 

-x 

'x 


~X 


++X 
(int)x 


*X 


&X 


x * y 


x /y 






User- 


overloadable 


No 
No 
No 
No 
Via indexer 
Yes 
Yes 
No 
No 


No 


No 


No 


No 


No 
No 
No 





Yes 
Yes 
Yes 


Yes 


Yes 
Yes 
No 
No 
No 
No 


Yes 


Yes 
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Category Operator Operator name Example User- 
symbol overloadable 
% Remainder x %Y Yes 
Additive + Add x+y Yes 
- Subtract x-y Yes 
Shift << Shift left x << 1 Yes 
>> Shift right x >> 1 Yes 
Relational < Less than x<y Yes 
> Greater than x>y Yes 
<= Less than orequal x <= y Yes 
to 
>= Greater than or X >= y Yes 
equal to 
is Type is or is x is y No 
subclass of 
as Type conversion x as y No 
Equality == Equals X == y Yes 
I= Not equals x lay Yes 
Logical And & And x &y Yes 
Logical Xor * Exclusive Or xy Yes 
Logical Or | Or x ly Yes 
Conditional And && Conditional And x && y Via & 
Conditional Or =| | Conditional Or x Il y Via | 
Null coalescing 7? Null coalescing x ?? y No 
Conditional 2: Conditional isTrue No 
? thenThisValue 
: elseThisValue 
Assignment& == Assign x=y No 
Lambda 
*= Multiply self by xX *= 2 Via * 
/= Divide self by x /= 2 Via / 
+= Add to self x += 2 Via + 
-= Subtract from self x -= 2 Via - 
<< Shift self left by X <<= 2 Via<< 
>> Shift self right by X >>= 2 Via >> 
&= And self by x B= 2 Via & 
A= Exclusive-Or self by x “= 2 Via * 
7 Or self by x |= 2 Via | 
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Category Operator Operator name Example User- 





symbol overloadable 

??= Null-coalescing x ??= 0 No 
assignment 

=> Lambda X => x +1 No 





Null Operators 


C# provides three operators to make it easier to work with nulls: the null-coalescing 
operator, the null-coalescing assignment operator, and the null-conditional operator. 


Null-Coalescing Operator 


The ?? operator is the null-coalescing operator. It says, “If the operand to the left is 
non-null, give it to me; otherwise, give me another value.” For example: 


string si = null; 
string s2 = s1 ?? "nothing"; // s2 evaluates to "nothing" 


If the lefthand expression is non-null, the righthand expression is never evaluated. 
The null-coalescing operator also works with nullable value types (see “Nullable 
Value Types” on page 185 in Chapter 4). 


Null-Coalescing Assignment Operator (C# 8) 


The ??= operator is the null-coalescing assignment operator. It says, “If the operand 
to the left is null, assign the right operand to the left operand.” For example: 


string si = null; 
si ??= "something"; 
Console.WriteLine (s1); // something 


si ??= "everything"; 
Console.WriteLine (s1); // something 


The operator is useful to replace the pattern 
if (myVariable == null) myVariable = someDefault; 
with: 


myVariable ??= someDefault; 


Null-Conditional Operator 


The ?. operator is the null-conditional or “Elvis” operator (after the Elvis emoticon). 
It allows you to call methods and access members just like the standard dot operator 
except that if the operand on the left is null, the expression evaluates to null instead 
of throwing a NullReferenceException: 


System. Text.StringBuilder sb = null; 
string s = sb?.ToString(); // No error; s instead evaluates to null 
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The last line is equivalent to the following: 
string s = (sb == null ? null : sb.ToString()); 


Upon encountering a null, the Elvis operator short-circuits the remainder of the 
expression. In the following example, s evaluates to null, even with a standard dot 
operator between ToString() and ToUpper(): 


System.Text.StringBuilder sb = null; 
string s = sb?.ToString().ToUpper(); // s evaluates to null without error 


Repeated use of Elvis is necessary only if the operand immediately to its left might 
be null. The following expression is robust to both x being null and x.y being null: 


X?.y?.Z 
It is equivalent to the following (except that x.y is evaluated only once): 


x == null ? null 
: (x.y == null ? null : x.y.z) 


The final expression must be capable of accepting a null. The following is illegal: 


System. Text.StringBuilder sb = null; 
int length = sb?.ToString().Length; // Illegal : int cannot be null 


We can fix this with the use of nullable value types (see “Nullable Value Types” on 
page 185 in Chapter 4). If you're already familiar with nullable value types, here’s a 
preview: 


int? Length = sb?.ToString().Length; // OK: int? can be null 
You can also use the null-conditional operator to call a void method: 
someObject?.SomeVoidMethod(); 


If someObject is null, this becomes a “no-operation” rather than throwing a Null 
ReferenceException. 


You can use the null-conditional operator with the commonly used type members 
that we describe in Chapter 3, including methods, fields, properties, and indexers. It 
also combines well with the null-coalescing operator: 


System.Text.StringBuilder sb = null; 
string s = sb?.ToString() ?? "nothing"; // s evaluates to "nothing" 


Statements 


Functions comprise statements that execute sequentially in the textual order in 
which they appear. A statement block is a series of statements appearing between 
braces (the {} tokens). 
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Declaration Statements 


A declaration statement declares a new variable, optionally initializing the variable 
with an expression. A declaration statement ends in a semicolon. You may declare 
multiple variables of the same type in a comma-separated list: 


string someWord = "rosebud"; 
int someNumber = 42; 
bool rich = true, famous = false; 


A constant declaration is like a variable declaration except that it cannot be changed 
after it has been declared, and the initialization must occur with the declaration (see 
“Constants” on page 90 in Chapter 3): 


const double c = 2.99792458E08; 
c += 10; // Compile-time error 
Local variables 


The scope of a local variable or local constant extends throughout the current block. 
You cannot declare another local variable with the same name in the current block 
or in any nested blocks: 


static void Main() 


{ 
int x; 
{ 
int y; 
int x; // Error - x already defined 
} 
{ 
int y; // OK - y not in scope 
} 
Console.Write (y); // Error - y is out of scope 
} 


A variable’s scope extends in both directions throughout its 
code block. This means that if we moved the initial declara- 
tion of x in this example to the bottom of the method, wed get 
the same error. This is in contrast to C++ and is somewhat 
peculiar, given that it’s not legal to refer to a variable or con- 
stant before it’s declared. 


Expression Statements 


Expression statements are expressions that are also valid statements. An expression 
statement must either change state or call something that might change state. 
Changing state essentially means changing a variable. Following are the possible 
expression statements: 
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¢ Assignment expressions (including increment and decrement expressions) 
e Method call expressions (both void and nonvoid) 


e Object instantiation expressions 


Here are some examples: 


// Declare variables with declaration statements: 
string s; 

int x, y; 

System. Text.StringBuilder sb; 


// Expression statements 


X= 1 4+ 23 // Assignment expression 

X++3 // Increment expression 

y = Math.Max (x, 5)3 // Assignment expression 
Console.WriteLine (y); // Method call expression 

sb = new StringBuilder(); // Assignment expression 

new StringBuilder(); // Object instantiation expression 


When you call a constructor or a method that returns a value, you're not obliged to 
use the result. However, unless the constructor or method changes state, the state- 
ment is completely useless: 


new StringBuilder(); // Legal, but useless 

new string ('c', 3); // Legal, but useless 

x.Equals (y); // Legal, but useless 
Selection Statements 


C# has the following mechanisms to conditionally control the flow of program 
execution: 

¢ Selection statements (if, switch) 

¢ Conditional operator (?:) 

e Loop statements (while, do-while, for, foreach) 


This section covers the simplest two constructs: the if statement and the switch 
statement. 


The if statement 
An if statement executes a statement ifa bool expression is true: 


if (5 < 2 * 3) 
Console.WriteLine ("true"); // true 


The statement can be a code block: 


if (5 < 2 * 3) 
{ 


Console.WriteLine ("true"); 
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Console.WriteLine ("Let's move on!"); 


} 


The else clause 
An if statement can optionally feature an else clause: 


if (2 + 2 == 5) 


r 
Console.WriteLine ("Does not compute"); we] 2 

270a9 

else Che 
Console.WriteLine ("False"); // False 9) i 
© 





Within an else clause, you can nest another if statement: 


if (2 + 2 == 5) 
Console.WriteLine ("Does not compute"); 
else 
if (2 + 2 == 4) 
Console.WriteLine ("Computes"); // Computes 


Changing the flow of execution with braces 


An else clause always applies to the immediately preceding if statement in the 
statement block: 


if (true) 
if (false) 
Console.WriteLine(); 
else 
Console.WriteLine ("executes"); 


This is semantically identical to the following: 


if (true) 
{ 
if (false) 
Console.WriteLine(); 
else 
Console.WriteLine ("executes"); 


} 


We can change the execution flow by moving the braces: 


if (true) 
{ 
if (false) 
Console.WriteLine(); 


} 


else 
Console.WriteLine ("does not execute"); 


With braces, you explicitly state your intention. This can improve the readability of 
nested if statements—even when not required by the compiler. A notable exception 
is with the following pattern: 


static void TellMeWhatICanDo (int age) 
{ 
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if (age >= 35) 
Console.WriteLine ("You can be president!"); 
else if (age >= 21) 
Console.WriteLine ("You can drink!"); 
else if (age >= 18) 
Console.WriteLine ("You can vote!"); 
else 
Console.WriteLine ("You can wait!"); 


} 


Here, we've arranged the if and else statements to mimic the elseif construct of 
other languages (and C#’s #elif preprocessor directive). Visual Studio’s auto- 
formatting recognizes this pattern and preserves the indentation. Semantically, 
though, each if statement following an else statement is functionally nested within 
the else clause. 


The switch statement 


switch statements let you branch program execution based on a selection of possi- 
ble values that a variable might have. switch statements can result in cleaner code 
than multiple if statements because switch statements require an expression to be 
evaluated only once: 


static void ShowCard (int cardNumber) 
{ 
switch (cardNumber ) 
{ 
case 13: 
Console.WriteLine ("King"); 
break; 
case 12: 
Console.WriteLine ("Queen"); 
break; 
case 11: 
Console.WriteLine ("Jack"); 
break; 
case -1: // Joker is -1 
goto case 12; // In this game joker counts as queen 
default: // Executes for any other cardNumber 
Console.WriteLine (cardNumber ); 
break; 
} 
} 


This example demonstrates the most common scenario, which is switching on con- 
stants. When you specify a constant, you're restricted to the built-in integral types, 
bool, char, enum types, and the string type. 


At the end of each case clause, you must specify explicitly where execution is to go 
next, with some kind of jump statement (unless your code ends in an infinite loop). 
Here are the options: 
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¢ break (jumps to the end of the switch statement) 

e goto case x (jumps to another case clause) 

e goto default (jumps to the default clause) 

e Any other jump statement—namely, return, throw, continue, or goto label 


When more than one value should execute the same code, you can list the common 
cases sequentially: 
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switch (cardNumber ) 
{ 
case 13: 
case 12: 
case 11: 
Console.WriteLine ("Face card"); 
break; 
default: 
Console.WriteLine ("Plain card"); 
break; 





} 


This feature of a switch statement can be pivotal in terms of producing cleaner 
code than multiple if-else statements. 


Switching on types 


Switching on a type is a special case of switching on a pattern. 
A number of other (moderately useful) patterns were intro- 
duced in C# 7 and C# 8; see “Patterns” on page 201 in Chap- 
ter 4 for a full discussion. 


From C# 7, you can also switch on types: 


static void Main() 

{ 
TellMeTheType (12); 
TellMeTheType ("hello"); 
TellMeTheType (true); 

} 


static void TellMeTheType (object x) // object allows any type. 


switch (x) 
{ 
case int i: 
Console.WriteLine ("It's an int!"); 
Console.WriteLine ($"The square of {i} is {i * i}"); 
break; 
case string s: 
Console.WriteLine ("It's a string"); 
Console.WriteLine ($"The length of {s} is {s.Length}"); 
break; 
default: 
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Console.WriteLine ("I don't know what x is"); 
break; 
} 
t 


(The object type allows for a variable of any type; we discuss this fully in “Inheri- 
tance” on page 106 and “The object Type” on page 116 in Chapter 3.) 


Each case clause specifies a type upon which to match, and a variable upon which to 
assign the typed value if the match succeeds (the “pattern” variable). Unlike with 
constants, there's no restriction on what types you can use. 


You can predicate a case with the when keyword: 


switch (x) 
{ 
case bool b when b == true: // Fires only when b is true 
Console.WriteLine ("True!"); 
break; 


case bool b: 
Console.WriteLine ("False!"); 
break; 


} 


The order of the case clauses can matter when switching on type (unlike when 
switching on constants). This example would give a different result if we reversed 
the two cases (in fact, it would not even compile, because the compiler would deter- 
mine that the second case is unreachable). An exception to this rule is the default 
clause, which is always executed last, regardless of where it appears. 


If you want to switch on a type, but are uninterested in its value, you can use a 
discard (_): 


case DateTime _ 
Console.WriteLine ("It's a DateTime"); 


You can stack multiple case clauses. The Console.WriteLine in the following code 
will execute for any floating-point type greater than 1,000: 


switch (x) 


{ 
case float f when f > 1000: 


case double d when d > 1000: 

case decimal m when m > 1000: 
Console.WriteLine ("We can refer to x here but not f or d or m"); 
break; 


} 


In this example, the compiler lets us consume the pattern variables f, d, and m, only 
in the when clauses. When we call Console.WriteLine, its unknown which one of 
those three variables will be assigned, so the compiler puts all of them out of scope. 


You can mix and match constants and patterns in the same switch statement. And 
you can also switch on the null value: 
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case null: 
Console.WriteLine ("Nothing here"); 
break; 


switch expressions (C# 8) 


From C# 8, you can use switch in the context of an expression. Assuming that 
cardNumber is of type int, the following illustrates its use: 


string cardName = cardNumber switch 


{ 
13 => "King", 
12 => "Queen", 
11 => "Jack", 
_ => "Pip card" // equivalent to 'default' 
t3 
Notice that the switch keyword appears after the variable name, and that the case 
clauses are expressions (terminated by commas) rather than statements. switch 
expressions are more compact than their switch statement counterparts, and you 
can use them in LINQ queries (Chapter 8). 


If you omit the default expression (_) and the switch fails to match, an exception is 
thrown. 


You can also switch on multiple values (the tuple pattern): 


int cardNumber = 12; 
string suit = "spades"; 


string cardName = (cardNumber, suit) switch 


{ 
(13, "spades") => "King of spades", 
(13, "clubs") => "King of clubs", 


3s 


Many more options are possible through the use of patterns (see “Patterns” on page 
201 in Chapter 4). 


Iteration Statements 

C# enables a sequence of statements to execute repeatedly with the while, do-while, 
for, and foreach statements. 

while and do-while loops 


while loops repeatedly execute a body of code while a bool expression is true. The 
expression is tested before the body of the loop is executed: 


int i = 0; 
while (i < 3) 
{ 


Console.WriteLine (i); 
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its; 


} 


OUTPUT: 
0 
1 
2 


do-while loops differ in functionality from while loops only in that they test the 
expression after the statement block has executed (ensuring that the block is always 
executed at least once). Here’s the preceding example rewritten with a do-while 
loop: 


int i = 0; 

do 

{ 
Console.WriteLine (i); 
its; 

} 


while (i < 3); 


for loops 


for loops are like while loops with special clauses for initialization and iteration of a 
loop variable. A for loop contains three clauses as follows: 


for (initialization-clause; condition-clause; iteration-clause) 
statement -or-statement-block 


Here’s what each clause does: 


Initialization clause 
Executed before the loop begins; used to initialize one or more iteration 
variables 


Condition clause 
The bool expression that, while true, will execute the body 


Iteration clause 
Executed after each iteration of the statement block; typically used to update 
the iteration variable 


For example, the following prints the numbers 0 through 2: 


for (int i = 0; i < 3; i++) 
Console.WriteLine (i); 


The following prints the first 10 Fibonacci numbers (in which each number is the 
sum of the previous two): 


for (int i = 0, prevFib = 1, curFib = 1; i < 10; i++) 
{ 

Console.WriteLine (prevFib); 

int newFib = prevFib + curFib; 
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prevFib = curFib; curFib = newFib; 
} 
Any of the three parts of the for statement can be omitted. You can implement an 
infinite loop such as the following (though while(true) can be used, instead): 


for (33) 
Console.WriteLine ("interrupt me"); 


foreach loops 


The foreach statement iterates over each element in an enumerable object. Most of 
the types in C# and .NET Core that represent a set or list of elements are enumera- 
ble. For example, both an array and a string are enumerable. Here is an example of 
enumerating over the characters in a string, from the first character through to the 
last: 


foreach (char c in "beer") // c is the iteration variable 
Console.WriteLine (c); 


OUTPUT: 
b 


e 
e 
r 


We define enumerable objects in “Enumeration and Iterators” on page 179 in 
Chapter 4. 


Jump Statements 
The C# jump statements are break, continue, goto, return, and throw. 


Jump statements obey the reliability rules of try statements 
(see “try Statements and Exceptions” on page 170 in Chap- 
ter 4). This means that: 


« A jump out of a try block always executes the try’s 
finally block before reaching the target of the jump. 


« A jump cannot be made from the inside to the outside of 
a finally block (except via throw). 


The break statement 


The break statement ends the execution of the body of an iteration or switch 
statement: 


int x = 0; 
while (true) 
{ 
if (xt++ > 5) 
break; // break from the loop 
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} 


// execution continues here after break 


The continue statement 


The continue statement forgoes the remaining statements in a loop and makes an 
early start on the next iteration. The following loop skips even numbers: 


for (int i = 0; i < 10; i++) 


{ 
if (CL & 2) == 6) // If i is even, 
continue; // continue with next iteration 
Console.Write (i +" "); 
} 


OUTPUT: 13579 


The goto statement 


The goto statement transfers execution to another label within a statement block. 
The form is as follows: 


goto statement- label; 
Or, when used within a switch statement: 
goto case case-constant; // (Only works with constants, not patterns) 


A label is a placeholder in a code block that precedes a statement, denoted with a 
colon suffix. The following iterates the numbers 1 through 5, mimicking a for loop: 
Wit t= 7; 
startLoop: 
if (i <= 5) 
{ 
Console.Write (i +" "); 
itt; 
goto startLoop; 
i 


OUTPUT: 12345 


The goto case case-constant transfers execution to another case in a switch 
block (see “The switch statement” on page 74). 


The return statement 


The return statement exits the method and must return an expression of the meth- 
od’s return type if the method is nonvoid: 


static decimal AsPercentage (decimal d) 


{ 
decimal p = d * 100m; 
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return p; // Return to the calling method with value 


} 


A return statement can appear anywhere in a method (except in a finally block), 
and can be used more than once. 


The throw statement 


The throw statement throws an exception to indicate an error has occurred (see “try 
Statements and Exceptions” on page 170 in Chapter 4): 


if (w == null) 
throw new ArgumentNullException (...); 


Miscellaneous Statements 


The using statement provides an elegant syntax for calling Dispose on objects that 
implement IDisposable, within a finally block (see “try Statements and Excep- 
tions” on page 170 in Chapter 4 and “IDisposable, Dispose, and Close” on page 523 
in Chapter 12). 


C# overloads the using keyword to have independent mean- 
ings in different contexts. Specifically, the using directive is 
different from the using statement. 


The lock statement is a shortcut for calling the Enter and Exit methods of the 
Monitor class (see Chapters 14 and 23). 


Namespaces 


A namespace is a domain for type names. Types are typically organized into hier- 
archical namespaces, making them easier to find and avoiding conflicts. For exam- 
ple, the RSA type that handles public-key encryption is defined within the following 
namespace: 


System.Security.Cryptography 


A namespace forms an integral part of a type’s name. The following code calls RSA’s 
Create method: 


System.Security.Cryptography.RSA rsa = 
System. Security.Cryptography.RSA.Create(); 


Namespaces are independent of assemblies, which are units of 
deployment such as an .exe or .dll (described in Chapter 18). 


Namespaces also have no impact on member visibility— 


public, internal, private, and so on. 


The namepace keyword defines a namespace for types within that block; for 
example: 
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namespace Outer .Middle. Inner 


{ 
class Class1 {} 
class Class2 {} 


} 


The dots in the namespace indicate a hierarchy of nested namespaces. The code that 
follows is semantically identical to the preceding example: 


Namespace Outer 


{ 


Namespace Middle 


{ 


Namespace Inner 


{ 
class Class1 {} 


class Class2 {} 
} 
} 
} 


You can refer to a type with its fully qualified name, which includes all namespaces 
from the outermost to the innermost. For example, we could refer to Class1 in the 
preceding example as Outer .Middle. Inner .Class1. 


Types not defined in any namespace are said to reside in the global namespace. The 
global namespace also includes top-level namespaces, such as Outer in our example. 


The using Directive 


The using directive imports a namespace, allowing you to refer to types without 
their fully qualified names. The following imports the previous example’s 
Outer .Middle. Inner namespace: 


using Outer.Middle. Inner; 


class Test 

{ 
static void Main() 
. Classi c; // Don't need fully qualified name 
} 

} 


It’s legal (and often desirable) to define the same type name in 
different namespaces. However, youd typically do so only if it 
was unlikely for a consumer to want to import both namespa- 
ces at once. A good example is the TextBox class, which is 
defined both in System.Windows.Controls (WPF) and 
System.Windows.Forms.Controls (Windows Forms). 
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using static 


The using static directive imports a type rather than a namespace. All static 
members of the imported type can then be used without qualification. In the follow- 
ing example, we call the Console class's static WriteLine method without needing to 
refer to the type: 


using static System.Console; 


class Test 


{ 
static void Main() { WriteLine ("Hello"); } 


} 


The using static directive imports all accessible static members of the type, 
including fields, properties, and nested types (Chapter 3). You can also apply this 
directive to enum types, in which case their members are imported. So, if we import 
the following enum type: 


using static System.Windows.Visibility; 
we can specify Hidden instead of Visibility .Hidden: 
var textBox = new TextBox { Visibility = Hidden }; // XAML-style 


Should an ambiguity arise between multiple static imports, the C# compiler is not 
smart enough to infer the correct type from the context and will generate an error. 


Rules Within a Namespace 


Name scoping 


You can use names declared in outer namespaces unqualified within inner name- 
spaces. In this example, Class1 does not need qualification within Inner: 


Namespace Outer 


{ 
class Class1 {} 


Namespace Inner 


{ 
class Class2 : Class1 {} 


} 
} 


If you want to refer to a type in a different branch of your namespace hierarchy, you 
can use a partially qualified name. In the following example, we base SalesReport 
on Common. ReportBase: 


namespace MyTradingCompany 


{ 


Namespace Common 


{ 
class ReportBase {} 





Namespaces | 83 


o 
9 
4 
i) 
1) 


e6en6bue 


#5 





} 


Namespace ManagementReporting 


{ 


class SalesReport : Common.ReportBase {} 
} 
} 


Name hiding 


If the same type name appears in both an inner and an outer namespace, the inner 
name wins. To refer to the type in the outer namespace, you must qualify its name: 


Namespace Outer 


{ 
class Foo { } 
namespace Inner 
class Foo { } 
class Test 
{ 
Foo f1; // = Outer.Inner.Foo 
Outer.Foo f2; // = Outer.Foo 
} 
} 
} 
All type names are converted to fully qualified names at com- 
pile time. Intermediate Language (IL) code contains no 
unqualified or partially qualified names. 
Repeated namespaces 


You can repeat a namespace declaration, as long as the type names within the name- 
spaces don't conflict: 


namespace Outer .Middle. Inner 


{ 
class Class1 {} 


} 


namespace Outer .Middle. Inner 


{ 
class Class2 {} 


} 


We can even break the example into two source files such that we could compile 
each class into a different assembly. 


Source file 1: 


Namespace Outer .Middle. Inner 


{ 
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class Class1 {} 
} 


Source file 2: 


namespace Outer .Middle. Inner 


{ 
class Class2 {} 


} 


Nested using directives 


You can nest a using directive within a namespace. This allows you to scope the 
using directive within a namespace declaration. In the following example, Class1 is 
visible in one scope, but not in another: 


Namespace Ni 


{ 
class Class1 {} 


} 


Namespace N2 


{ 


using N41; 


class Class2 : Classi {} 
} 


Namespace N2 


{ 
class Class3 : Classi {} // Compile-time error 


} 


Aliasing Types and Namespaces 


Importing a namespace can result in type-name collision. Rather than importing 
the entire namespace, you can import just the specific types that you need, giving 
each type an alias: 


using PropertyInfo2 = System.Reflection.PropertyInfo; 
class Program { PropertyInfo2 p; } 


An entire namespace can be aliased, as follows: 


using R = System.Reflection; 
class Program { R.PropertyInfo p; } 


Advanced Namespace Features 


Extern 


Extern aliases allow your program to reference two types with the same fully quali- 
fied name (i.e., the namespace and type name are identical). This is an unusual 
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scenario and can occur only when the two types come from different assemblies. 
Consider the following example. 


Library 1, compiled to Widgets 1.dll: 


namespace Widgets 


{ 
public class Widget {} 


I 
Library 2, compiled to Widgets2. dll: 


namespace Widgets 


{ 
public class Widget {} 


I 
Application, which references Widgets1.dll and Widgets2. dll: 


using Widgets; 


class Test 

{ 
static void Main() 
: Widget w = new Widget(); 
} 

} 


The application cannot compile, because Widget is ambiguous. Extern aliases can 
resolve the ambiguity. The first step is to modify the application's .csproj file, assign- 
ing a unique alias to each reference: 


<ItemGroup> 
<Reference Include="Widgets1"> 
<Aliases>W1</Aliases> 
</Reference> 
<Reference Include="Widgets2"> 
<Aliases>W2</Aliases> 
</Reference> 
</ItemGroup> 


The second step is to use the extern alias directive: 


extern alias W1; 
extern alias W2; 


class Test 
{ 
static void Main() 
: W1.Widgets.Widget wi = new W1.Widgets.Widget(); 
W2.Widgets.Widget w2 = new W2.Widgets.Widget(); 
} 
} 
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Namespace alias qualifiers 


As we mentioned earlier, names in inner namespaces hide names in outer namespa- 
ces. However, sometimes even the use of a fully qualified type name does not resolve 
the conflict. Consider the following example: 


Namespace N 


{ 
class A 
{ 
static void Main() => new A.B(); // Instantiate class B 
public class B {} // Nested type 
} 
} 
Namespace A 
{ 
class B {} 
} 


The Main method could be instantiating either the nested class B, or the class B 
within the namespace A. The compiler always gives higher precedence to identifiers 
in the current namespace—in this case, the nested B class. 


To resolve such conflicts, a namespace name can be qualified, relative to one of the 
following: 


¢ The global namespace—the root of all namespaces (identified with the contex- 
tual keyword global) 


e The set of extern aliases 


The :: token performs namespace alias qualification. In this example, we qualify 
using the global namespace (this is most commonly seen in autogenerated code to 
avoid name conflicts): 


Namespace N 


{ 


class A 


{ 


static void Main() 


{ 
System.Console.WriteLine (new A.B()); 
System.Console.WriteLine (new global: :A.B()); 


} 


public class B {} 
} 
} 


Namespace A 


{ 
class B {} 


} 
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Here is an example of qualifying with an alias (adapted from the example in 
“Extern” on page 85): 


extern alias W1; 
extern alias W2; 


class Test 
{ 
static void Main() 
: W1::Widgets.Widget w1 = new W1::Widgets.Widget(); 
W2::Widgets.Widget w2 = new W2::Widgets.Widget(); 
} 
a; 
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Creating Types in C# 


In this chapter, we delve into types and type members. 


Classes 


A class is the most common kind of reference type. The simplest possible class dec- 
laration is as follows: 


class YourClassName 
{ 
} 


A more complex class optionally has the following: 


Preceding the keyword Attributes and class modifiers. The non-nested class modifiers are public, 
class internal, abstract, sealed, static, unsafe, and partial 


Following YourClassName — Generic type parameters and constraints, a base class, and interfaces 


Within the braces Class members (these are methods, properties, indexers, events, fields, 
constructors, overloaded operators, nested types, and a finalizer) 


This chapter covers all of these constructs except attributes, operator functions, and 
the unsafe keyword, which are covered in Chapter 4. The following sections enu- 
merate each of the class members. 


Fields 


A field is a variable that is a member of a class or struct; for example: 


class Octopus 


{ 

string name; 

public int Age = 10; 
} 
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Fields allow the following modifiers: 


Static modifier static 

Access modifiers public internal private protected 
Inheritance modifier new 

Unsafe code modifier unsafe 

Read-only modifier readonly 


Threading modifier volatile 


The readonly modifier 


The readonly modifier prevents a field from being modified after construction. A 
read-only field can be assigned only in its declaration or within the enclosing type’s 
constructor. 


Field initialization 
Field initialization is optional. An uninitialized field has a default value (0, \0, null, 
false). Field initializers run before constructors: 
public int Age = 10; 
A field initializer can contain expressions and call methods: 


static readonly string TempFolder = System.10.Path.GetTempPath(); 


Declaring multiple fields together 


For convenience, you can declare multiple fields of the same type in a comma- 
separated list. This is a convenient way for all the fields to share the same attributes 
and field modifiers: 


static readonly int legs = 8, 
eyes = 2; 


Constants 


A constant is evaluated statically at compile time and the compiler literally substi- 
tutes its value whenever used (rather like a macro in C++). A constant can be any of 
the built-in numeric types, bool, char, string, or an enum type. 


A constant is declared with the const keyword and must be initialized with a value. 
For example: 


public class Test 


{ 


public const string Message = "Hello World"; 


} 
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A constant can serve a similar role to a static readonly field, but it is much more 
restrictive—both in the types you can use and in field initialization semantics. A 
constant also differs from a static readonly field in that the evaluation of the con- 
stant occurs at compile time; thus: 


public static double Circumference (double radius) 


{ 


return 2 * System.Math.PI * radius; 


} 
is compiled to: 


public static double Circumference (double radius) 


{ 
return 6.2831853071795862 * radius; 


I 
It makes sense for PI to be a constant because its value is predetermined at compile 
time. In contrast, a static readonly field’s value can potentially differ each time 
the program is run: 


static readonly DateTime StartupTime = DateTime.Now; 


A static readonly field is also advantageous when exposing 
to other assemblies a value that might change in a later ver- 
sion. For instance, suppose that assembly X exposes a constant 
as follows: 


public const decimal ProgramVersion = 2.3; 


If assembly Y references X and uses this constant, the value 2.3 
will be baked into assembly Y when compiled. This means that 
if X is later recompiled with the constant set to 2.4, Y will still 
use the old value of 2.3 until Y is recompiled. A static 
readonly field avoids this problem. 


Another way of looking at this is that any value that might 
change in the future is not constant by definition; thus, it 
should not be represented as one. 


Constants can also be declared local to a method: 


static void Main() 


{ 
const double twoPI = 2 * System.Math.PI; 


et 


Nonlocal constants allow the following modifiers: 


Access modifiers public internal private protected 


Inheritance modifier new 
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Methods 


A method performs an action in a series of statements. A method can receive input 
data from the caller by specifying parameters and output data back to the caller by 
specifying a return type. A method can specify a void return type, indicating that it 
doesn’t return any value to its caller. A method can also output data back to the 
caller via ref/out parameters. 


A method's signature must be unique within the type. A method's signature compri- 
ses its name and parameter types in order (but not the parameter names, nor the 
return type). 


Methods allow the following modifiers: 


Static modifier static 

Access modifiers public internal private protected 
Inheritance modifiers new virtual abstract override sealed 
Partial method modifier partial 


Unmanaged code modifiers unsafe extern 


Asynchronous code modifier async 


Expression-bodied methods 
A method that comprises a single expression, such as 
int Foo (int x) { return x * 2; } 


can be written more tersely as an expression-bodied method. A fat arrow replaces the 
braces and return keyword: 


int Foo (int x) => x * 2; 
Expression-bodied functions can also have a void return type: 


void Foo (int x) => Console.WriteLine (x); 


Overloading methods 


A type can overload methods (have multiple methods with the same name) as long 
as the signatures are different. For example, the following methods can all coexist in 
the same type: 


void Foo (int x) {...} 

void Foo (double x) {...} 

void Foo (int x, float y) {...} 
void Foo (float x, int y) {...} 


However, the following pairs of methods cannot coexist in the same type, because 
the return type and the params modifier are not part of a method's signature: 


void Foo (int x) {...} 
float Foo (int x) {...} // Compile-time error 
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void Goo (int[] x) {...} 
void Goo (params int[] x) {...} // Compile-time error 


Pass-by-value versus pass-by-reference 


Whether a parameter is pass-by-value or pass-by-reference is also part of the signa- 
ture. For example, Foo(int) can coexist with either Foo(ref int) or Foo(out 
int). However, Foo(ref int) and Foo(out int) cannot coexist: 


void Foo (int x) {...} 
void Foo (ref int x) {...} // OK so far 
void Foo (out int x) {...} // Compile-time error 


Local methods 
You can define a method within another method: 


void WriteCubes() 


{ 
Console.WriteLine (Cube (3)); 


Console.WriteLine (Cube (4)); 
Console.WriteLine (Cube (5)); 


int Cube (int value) => value * value * value; 


} 


The local method (Cube, in this case) is visible only to the enclosing method 
(WriteCubes). This simplifies the containing type and instantly signals to anyone 
looking at the code that Cube is used nowhere else. Another benefit of local methods 
is that they can access the local variables and parameters of the enclosing method. 
This has a number of consequences, which we describe in detail in “Capturing 
Outer Variables” on page 166 in Chapter 4. 


Local methods can appear within other function kinds, such as property accessors, 
constructors, and so on. You can even put local methods inside other local methods, 
and inside lambda expressions that use a statement block (Chapter 4). Local meth- 
ods can be iterators (Chapter 4) or asynchronous (Chapter 14). 


The static modifier is invalid for local methods. They are implicitly static if the 
enclosing method is static. 


Static local methods (C# 8) 


Adding the static modifier to a local method prevents it from seeing the local vari- 
ables and parameters of the enclosing method. This helps to reduce coupling as well 
as enabling the local method to declare variables as it pleases, without risk of collid- 
ing with those in the containing method. 
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Instance Constructors 


Constructors run initialization code on a class or struct. A constructor is defined 
like a method, except that the method name and return type are reduced to the 
name of the enclosing type: 


public class Panda 


{ 
string name; // Define field 
public Panda (string n) // Define constructor 
{ 
name = n; // Initialization code (set up field) 
} 
} 


Panda p = new Panda ("Petey"); // Call constructor 


Instance constructors allow the following modifiers: 


Access modifiers public internal private protected 


Unmanaged code modifiers unsafe extern 


Single-statement constructors can also be written as expression-bodied members: 


public Panda (string n) => name = n; 


Overloading constructors 


A class or struct may overload constructors. To avoid code duplication, one con- 
structor can call another, using the this keyword: 


using System; 


public class Wine 


{ 


public decimal Price; 

public int Year; 

public Wine (decimal price) { Price = price; } 

public Wine (decimal price, int year) : this (price) { Year = year; } 


} 


When one constructor calls another, the called constructor executes first. 
You can pass an expression into another constructor, as follows: 
public Wine (decimal price, DateTime year) : this (price, year.Year) { } 


The expression itself cannot make use of the this reference—for example, to call an 
instance method. (This is enforced because the object has not been initialized by the 
constructor at this stage, so any methods that you call on it are likely to fail.) It can, 
however, call static methods. 
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Implicit parameterless constructors 


For classes, the C# compiler automatically generates a parameterless public con- 
structor if and only if you do not define any constructors. However, as soon as you 
define at least one constructor, the parameterless constructor is no longer automati- 
cally generated. 


Constructor and field initialization order 


We previously saw that fields can be initialized with default values in their 
declaration: 


class Player 


{ 
int shields = 50; // Initialized first 


int health = 100; // Initialized second 
I 
Field initializations occur before the constructor is executed, and in the declaration 
order of the fields. 


Nonpublic constructors 


Constructors do not need to be public. A common reason to have a nonpublic con- 
structor is to control instance creation via a static method call. The static method 
could be used to return an object from a pool rather than creating a new object, or 
to return various subclasses based on input arguments: 


public class Classi 


Classi() {} // Private constructor 
public static Class1 Create (...) 
{ 
// Perform custom logic here to return an instance of Classi 
} 
} 
Deconstructors 


A deconstructor (also called a deconstructing method) acts as an approximate oppo- 
site to a constructor: whereas a constructor typically takes a set of values (as param- 
eters) and assigns them to fields, a deconstructor does the reverse and assigns fields 
back to a set of variables. 


A deconstruction method must be called Deconstruct, and have one or more out 
parameters, such as in the following class: 


class Rectangle 


{ 
public readonly float Width, Height; 


public Rectangle (float width, float height) 
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{ 
Width = width; 
Height = height; 
} 


public void Deconstruct (out float width, out float height) 


{ 
width = Width; 
height = Height; 
} 
} 


The following special syntax calls the deconstructor: 


var rect = new Rectangle (3, 4); 
(float width, float height) = rect; // Deconstruction 
Console.WriteLine (width + " " + height); /1/34 


The second line is the deconstructing call. It creates two local variables and then 
calls the Deconstruct method. Our deconstructing call is equivalent to the 
following: 


float width, height; 
rect.Deconstruct (out width, out height); 


Or: 
rect.Deconstruct (out var width, out var height); 

Deconstructing calls allow implicit typing, so we could shorten our call to this: 
(var width, var height) = rect; 

Or simply this: 
var (width, height) = rect; 


You can use C#’s discard symbol (_) if you're uninterested in 
one or more variables: 

var (_, height) = rect; 
This better indicates your intention than declaring a variable 
that you never use. 


If the variables into which you're deconstructing are already defined, omit the types 
altogether: 


float width, height; 

(width, height) = rect; 
This is called a deconstructing assignment. You can use a deconstructing assignment 
to simplify your class’s constructor: 


public Rectangle (float width, float height) => 
(Width, Height) = (width, height); 
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You can offer the caller a range of deconstruction options by overloading the 
Deconstruct method. 


The Deconstruct method can be an extension method (see 
“Extension Methods” on page 193 in Chapter 4). This is a use- 
ful trick if you want to deconstruct types that you did not 
author. 


Object Initializers 


To simplify object initialization, any accessible fields or properties of an object can 
be set via an object initializer directly after construction. For example, consider the 
following class: 


public class Bunny 

{ 
public string Name; 
public bool LikesCarrots; 
public bool LikesHumans; 
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public Bunny () {} 
public Bunny (string n) { Name = n; } 
} 


Using object initializers, you can instantiate Bunny objects as follows: 


// Note parameterless constructors can omit empty parentheses 
Bunny b1 = new Bunny { Name="Bo", LikesCarrots=true, LikesHumans=false }; 
Bunny b2 = new Bunny ("Bo") { LikesCarrots=true, LikesHumans=false }; 


The code to construct b1 and b2 is precisely equivalent to the following: 


Bunny temp1 = new Bunny(); // temp1 is a compiler-generated name 
temp1.Name = "Bo"; 

temp1.LikesCarrots = true; 

temp1.LikesHumans = false; 

Bunny b1 = temp1; 


Bunny temp2 = new Bunny ("Bo"); 

temp2.LikesCarrots = true; 

temp2.LikesHumans = false; 

Bunny b2 = temp2; 
The temporary variables are to ensure that if an exception is thrown during initiali- 
zation, you cant end up with a half-initialized object. 


Object initializers were introduced in C# 3.0. 
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Object Initializers Versus Optional Parameters 


Instead of using object initializers, we could make Bunny’s constructor accept 
optional parameters: 


public Bunny (string name, 
bool likesCarrots = false, 
bool likesHumans = false) 


{ 
Name = name; 
LikesCarrots = LikesCarrots; 
LikesHumans = lLikesHumans; 


} 


This would allow us to construct a Bunny as follows: 


Bunny b1 = new Bunny (name: "Bo", 
likesCarrots: true); 


An advantage of this approach is that we could make Bunny’s fields (or properties, 
which we explain shortly) read-only if we choose. Making fields or properties read- 
only is good practice when there’s no valid reason for them to change throughout 
the life of the object. 


The disadvantage in this approach is that each optional parameter value is baked 
into the calling site. In other words, C# translates our constructor call into this: 


Bunny b1 = new Bunny ("Bo", true, false); 


This can be problematic if we instantiate the Bunny class from another assembly, and 
later modify Bunny by adding another optional parameter—such as likesCats. 
Unless the referencing assembly is also recompiled, it will continue to call the (now 
nonexistent) constructor with three parameters and fail at runtime. (A subtler prob- 
lem is that if we changed the value of one of the optional parameters, callers in other 
assemblies would continue to use the old optional value until they were 
recompiled.) 


Hence, you should exercise caution with optional parameters in public functions if 
you want to offer binary compatibility between assembly versions. 











The this Reference 


The this reference refers to the instance itself. In the following example, the Marry 
method uses this to set the partner’s mate field: 


public class Panda 


{ 
public Panda Mate; 


public void Marry (Panda partner) 


{ 
Mate = partner; 
partner.Mate = this; 
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i 
} 


The this reference also disambiguates a local variable or parameter from a field; for 
example: 


public class Test 


{ 
string name; 
public Test (string name) { this.name = name; } 


} 


The this reference is valid only within nonstatic members of a class or struct. 


Properties 


Properties look like fields from the outside, but internally they contain logic, like 
methods do. For example, you can't tell by looking at the following code whether 
CurrentPrice is a field or a property: 
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Stock msft = new Stock(); 

msft.CurrentPrice = 30; 

msft.CurrentPrice -= 3; 

Console.WriteLine (msft.CurrentPrice); 
A property is declared like a field but with a get/set block added. Here’s how to 
implement CurrentPrice as a property: 


public class Stock 


{ 
decimal currentPrice; // The private "backing" field 
public decimal CurrentPrice // The public property 
{ 
get { return currentPrice; } 
set { currentPrice = value; } 
} 
} 


get and set denote property accessors. The get accessor runs when the property is 
read. It must return a value of the property’s type. The set accessor runs when the 
property is assigned. It has an implicit parameter named value of the property’s 
type that you typically assign to a private field (in this case, currentPrice). 


Although properties are accessed in the same way as fields, they differ in that they 
give the implementer complete control over getting and setting its value. This con- 
trol enables the implementer to choose whatever internal representation is needed 
without exposing the internal details to the user of the property. In this example, the 
set method could throw an exception if value was outside a valid range of values. 


Throughout this book, we use public fields extensively to keep 
the examples free of distraction. In a real application, you 
would typically favor public properties over public fields in 
order to promote encapsulation. 
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Properties allow the following modifiers: 


Static modifier static 
Access modifiers public internal private protected 
Inheritance modifiers new virtual abstract override sealed 


Unmanaged code modifiers unsafe extern 


Read-only and calculated properties 


A property is read-only if it specifies only a get accessor, and it is write-only if it 
specifies only a set accessor. Write-only properties are rarely used. 


A property typically has a dedicated backing field to store the underlying data. 
However, a property can also be computed from other data: 


decimal currentPrice, sharesOwned; 


public decimal Worth 
{ 


get { return currentPrice * sharesOwned; } 


} 


Expression-bodied properties 


You can declare a read-only property, such as the one in the preceding example, 
more tersely as an expression-bodied property. A fat arrow replaces all the braces and 
the get and return keywords: 


public decimal Worth => currentPrice * sharesOwned; 
With a little extra syntax, set accessors can also be expression-bodied: 


public decimal Worth 
{ 


get => currentPrice * sharesOwned; 
set => sharesOwned = value / currentPrice; 


} 


Automatic properties 


The most common implementation for a property is a getter and/or setter that sim- 
ply reads and writes to a private field of the same type as the property. An automatic 
property declaration instructs the compiler to provide this implementation. We can 
improve the first example in this section by declaring CurrentPrice as an automatic 
property: 

public class Stock 

{ 


public decimal CurrentPrice { get; set; } 


} 
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The compiler automatically generates a private backing field of a compiler- 
generated name that cannot be referred to. The set accessor can be marked private 
or protected if you want to expose the property as read-only to other types. Auto- 
matic properties were introduced in C# 3.0. 


Property initializers 
You can add a property initializer to automatic properties, just as with fields: 
public decimal CurrentPrice { get; set; } = 123; 


This gives CurrentPrice an initial value of 123. Properties with an initializer can be 
read-only: 


public int Maximum { get; } = 999; 


Bul}ea1D 
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get and set accessibility 


The get and set accessors can have different access levels. The typical use case for 
this is to have a public property with an internal or private access modifier on 
the setter: 


public class Foo 


{ 


private decimal x; 
public decimal X 


{ 


get { return x; } 
private set { x = Math.Round (value, 2); } 


} 
} 
Notice that you declare the property itself with the more permissive access level 
(public, in this case), and add the modifier to the accessor you want to be less 
accessible. 


CLR property implementation 
C# property accessors internally compile to methods called get_XXX and set_XXXx: 


public decimal get_CurrentPrice {...} 

public void set_CurrentPrice (decimal value) {...} 
Simple nonvirtual property accessors are inlined by the Just-In-Time (JIT) compiler, 
eliminating any performance difference between accessing a property and a field. 
Inlining is an optimization in which a method call is replaced with the body of that 
method. 


With properties in Windows Runtime libraries, the compiler assumes the put_XXX 
naming convention rather than set_XXX. 
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Indexers 


Indexers provide a natural syntax for accessing elements in a class or struct that 
encapsulate a list or dictionary of values. Indexers are similar to properties but are 
accessed via an index argument rather than a property name. The string class has 
an indexer that lets you access each of its char values via an int index: 


string s = "hello"; 
Console.WriteLine (s[0]); // 'h' 
Console.WriteLine (s[3]); // 'l' 


The syntax for using indexers is like that for using arrays, except that the index 
argument(s) can be of any type(s). 


Indexers have the same modifiers as properties (see “Properties” on page 99) and 
can be called null-conditionally by inserting a question mark before the square 
bracket (see “Null Operators” on page 69 in Chapter 2): 


string s = null; 
Console.WriteLine (s?[0]); // Writes nothing; no error. 


Implementing an indexer 


To write an indexer, define a property called this, specifying the arguments in 
square brackets: 


class Sentence 


{ 
string[] words = "The quick brown fox".Split(); 


public string this [int wordNum] // indexer 


{ 
get { return words [wordNum]; } 
set { words [wordNum] = value; } 
} 
} 


Here’s how we could use this indexer: 


Sentence s = new Sentence(); 


Console.WriteLine (s[3]); // fox 
s[3] = "kangaroo"; 
Console.WriteLine (s[3]); // kangaroo 


A type can declare multiple indexers, each with parameters of different types. An 
indexer can also take more than one parameter: 


public string this [int arg1, string arg2] 
{ 
get { ... } set { ... } 
} 
If you omit the set accessor, an indexer becomes read-only, and you can use 
expression-bodied syntax to shorten its definition: 


public string this [int wordNum] => words [wordNum]; 
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CLR indexer implementation 

Indexers internally compile to methods called get_Item and set_Item, as follows: 
public string get_Item (int wordNum) {...} 
public void set_Item (int wordNum, string value) {...} 

Using indices and ranges with indexers (C# 8) 


You can support indices and ranges (see “Indices and Ranges (C# 8)” on page 49 in 
Chapter 2) in your own classes by defining an indexer with a parameter type of 
Index or Range. We could extend our previous example by adding the following 
indexers to the Sentence class: 


public string this [Index index] => words [index]; 
public string[] this [Range range] => words [range]; 
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Sentence s = new Sentence(); 
Console.WriteLine (s [%1]); // fox 
string[] firstTwoWords = s [..2]; // (The, quick) 


Static Constructors 


A static constructor executes once per type rather than once per instance. A type can 
define only one static constructor, and it must be parameterless and have the same 
name as the type: 


class Test 


{ 
static Test() { Console.WriteLine ("Type Initialized"); } 


} 


The runtime automatically invokes a static constructor just prior to the type being 
used. Two things trigger this: 
 Instantiating the type 


e Accessing a static member in the type 


The only modifiers allowed by static constructors are unsafe and extern. 


If a static constructor throws an unhandled exception (Chap- 
ter 4), that type becomes unusable for the life of the 
application. 


Static constructors and field initialization order 


Static field initializers run just before the static constructor is called. If a type has no 
static constructor, static field initializers will execute just prior to the type being 
used—or anytime earlier at the whim of the runtime. 
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Static field initializers run in the order in which the fields are declared. The follow- 
ing example illustrates this: X is initialized to 0 and Y is initialized to 3. 


class Foo 

{ 
public static int X = Y; // 9 
public static int Y = 3; // 3 


} 


If we swap the two field initializers around, both fields are initialized to 3. The next 
example prints 0 followed by 3 because the field initializer that instantiates a Foo 
executes before X is initialized to 3: 


class Program 


{ 
static void Main() { Console.WriteLine (Foo.X); } // 3 


} 


class Foo 


{ 
public static Foo Instance = new Foo(); 
public static int X = 3; 


Foo() { Console.WriteLine (X); } // 0 
} 


If we swap the two lines in boldface, the example prints 3 followed by 3. 


Static Classes 


A class can be marked static, indicating that it must be composed solely of static 
members and cannot be subclassed. The System. Console and System.Math classes 
are good examples of static classes. 


Finalizers 


Finalizers are class-only methods that execute before the garbage collector reclaims 
the memory for an unreferenced object. The syntax for a finalizer is the name of the 
class prefixed with the ~ symbol: 


class Class1 


{ 
~Class1() 


{ 


or 
} 


This is actually C# syntax for overriding Object’s Finalize method, and the com- 
piler expands it into the following method declaration: 


protected override void Finalize() 


{ 
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base. Finalize(); 


} 


We discuss garbage collection and finalizers fully in Chapter 12. 


Finalizers allow the following modifier: 
Unmanaged code modifier unsafe 


You can write single-statement finalizers using expression-bodied syntax: 


~Class1() => Console.WriteLine ("Finalizing"); 


Partial Types and Methods 


Partial types allow a type definition to be split—typically across multiple files. A 
common scenario is for a partial class to be autogenerated from some other source 
(such as a Visual Studio template or designer), and for that class to be augmented 
with additional hand-authored methods: 


// PaymentFormGen.cs - auto-generated 
partial class PaymentForm { ... } 


// PaymentForm.cs - hand-authored 
partial class PaymentForm { ... } 


Each participant must have the partial declaration; the following is illegal: 


partial class PaymentForm {} 
class PaymentForm {} 


Participants cannot have conflicting members. A constructor with the same param- 
eters, for instance, cannot be repeated. Partial types are resolved entirely by the 
compiler, which means that each participant must be available at compile time and 
must reside in the same assembly. 


You can specify a base class on one or more partial class declarations, as long as the 
base class, if specified, is the same. In addition, each participant can independently 
specify interfaces to implement. We cover base classes and interfaces in “Inheri- 
tance” on page 106 and “Interfaces” on page 125. 


The compiler makes no guarantees with regard to field initialization order between 
partial type declarations. 


Partial methods 


A partial type can contain partial methods. These let an autogenerated partial type 
provide customizable hooks for manual authoring; for example: 


partial class PaymentForm // In auto-generated file 


{ 


partial void ValidatePayment (decimal amount); 


} 
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partial class PaymentForm // In hand-authored file 
{ 


partial void ValidatePayment (decimal amount) 


{ 
if (amount > 100) 


t 
i 
A partial method consists of two parts: a definition and an implementation. The defi- 
nition is typically written by a code generator, and the implementation is typically 
manually authored. If an implementation is not provided, the definition of the par- 
tial method is compiled away (as is the code that calls it). This allows autogenerated 
code to be liberal in providing hooks without having to worry about bloat. Partial 
methods must be void and are implicitly private. 


The nameof operator 


The nameof operator returns the name of any symbol (type, member, variable, and 
so on) as a string: 


int count = 123; 
string name = nameof (count); // name is "count" 


Its advantage over simply specifying a string is that of static type checking. Tools 
such as Visual Studio can understand the symbol reference, so if you rename the 
symbol in question, all of its references will be renamed, too. 


To specify the name of a type member such as a field or property, include the type as 
well. This works with both static and instance members: 


string name = nameof (StringBuilder .Length); 


This evaluates to Length. To return StringBuilder .Length, you would do this: 


nameof (StringBuilder) + + nameof (StringBuilder .Length) ; 


Inheritance 


A class can inherit from another class to extend or customize the original class. 
Inheriting from a class lets you reuse the functionality in that class instead of build- 
ing it from scratch. A class can inherit from only a single class but can itself be 
inherited by many classes, thus forming a class hierarchy. In this example, we begin 
by defining a class called Asset: 


public class Asset 


{ 


public string Name; 


} 
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Next, we define classes called Stock and House, which will inherit from Asset. 
Stock and House get everything an Asset has, plus any additional members that 
they define: 


public class Stock : Asset // inherits from Asset 


{ 
public long SharesOwned; 


} 


public class House : Asset // inherits from Asset 


{ 


public decimal Mortgage; 


} 


Here’s how we can use these classes: 


Stock msft = new Stock { Name="MSFT", 
SharesOwned=1000 }; 


Console.WriteLine (msft.Name); // MSFT 
Console.WriteLine (msft.SharesOwned); // 1000 


House mansion = new House { Name="Mansion", 
Mortgage=250000 }; 


Console.WriteLine (mansion.Name); // Mansion 
Console.WriteLine (mansion.Mortgage); // 250000 
The derived classes, Stock and House, inherit the Name property from the base class, 
Asset. 
A derived class is also called a subclass. 


A base class is also called a superclass. 


Polymorphism 


References are polymorphic. This means a variable of type x can refer to an object 
that subclasses x. For instance, consider the following method: 


public static void Display (Asset asset) 
{ 


System.Console.WriteLine (asset.Name) ; 


i 
This method can display both a Stock and a House because they are both Assets: 


Stock msft = new Stock ... ; 
House mansion = new House ... 3; 


Display (msft); 
Display (mansion); 





Inheritance | 107 


+ 
< 
xe) 
o 
7) 
= 
a 
cs 


Buljea1D 





Polymorphism works on the basis that subclasses (Stock and House) have all the 
features of their base class (Asset). The converse, however, is not true. If Display 
was modified to accept a House, you could not pass in an Asset: 


static void Main() { Display (new Asset()); } // Compile-time error 


public static void Display (House house) // Will not accept Asset 
ie 


System.Console.WriteLine (house.Mortgage) ; 


} 


Casting and Reference Conversions 


An object reference can be: 


¢ Implicitly upcast to a base class reference 
¢ Explicitly downcast to a subclass reference 
Upcasting and downcasting between compatible reference types performs reference 


conversions: a new reference is (logically) created that points to the same object. An 
upcast always succeeds; a downcast succeeds only if the object is suitably typed. 


Upcasting 
An upcast operation creates a base class reference from a subclass reference: 


Stock msft = new Stock(); 
Asset a = msft; // Upcast 


After the upcast, variable a still references the same Stock object as variable msft. 
The object being referenced is not itself altered or converted: 


Console.WriteLine (a == msft); // True 


Although a and msft refer to the identical object, a has a more restrictive view on 
that object: 

Console.WriteLine (a.Name); // OK 

Console.WriteLine (a.SharesOwned) ; // Compile-time error 
The last line generates a compile-time error because the variable a is of type Asset, 


even though it refers to an object of type Stock. To get to its SharesOwned field, you 
must downcast the Asset to a Stock. 


Downcasting 


A downcast operation creates a subclass reference from a base class reference: 


Stock msft = new Stock(); 

Asset a = msft; // Upcast 
Stock s = (Stock)a; // Downcast 
Console.WriteLine (s.SharesOwned) ; // <No error> 
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Console.WriteLine (s == a); // True 
Console.WriteLine (s == msft); // True 


As with an upcast, only references are affected—not the underlying object. A down- 
cast requires an explicit cast because it can potentially fail at runtime: 


House h 
Asset a 
Stock s 


new House(); 


// Upcast always succeeds 


= (Stock)a; // Downcast fails: a is not a Stock 


If a downcast fails, an InvalidCastException is thrown. This is an example of run- 
time type checking (we elaborate on this concept in “Static and Runtime Type Check- 
ing” on page 118). 


The as operator 


The as operator performs a downcast that evaluates to null (rather than throwing 


an exception) if the downcast fails: 


Asset a 
Stock s 


= new Asset(); 


a as Stock; // s is null; no exception thrown 
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This is useful when you're going to subsequently test whether the result is null: 


if (s != null) Console.WriteLine (s.SharesOwned) ; 


Without such a test, a cast is advantageous, because if it fails, a 
more helpful exception is thrown. We can illustrate by com- 
paring the following two lines of code: 


((Stock)a).SharesOwned; // Approach #1 
(a as Stock).SharesOwned; // Approach #2 


int shares = 
int shares = 
If a is not a Stock, the first line throws an InvalidCastExcep 
tion, which is an accurate description of what went wrong. 
The second line throws a NullReferenceException, which is 


ambiguous. Was a not a Stock or was a null? 


Another way of looking at it is that with the cast operator, 
youre saying to the compiler: “I’m certain of a value's type; if 
I'm wrong, there’s a bug in my code, so throw an exception!” 
Whereas with the as operator, youre uncertain of its type and 
want to branch according to the outcome at runtime. 


The as operator cannot perform custom conversions (see “Operator Overloading” on 
page 216 in Chapter 4) and it cannot do numeric conversions: 


long x 


3 as long; // Compile-time error 


The as and cast operators will also perform upcasts, although 
this is not terribly useful because an implicit conversion will 


do the job. 





Inheritance 


| 109 


The is operator 


The is operator tests whether a variable matches a pattern. C# supports several 
kinds of patterns, the most important being a type pattern, where a type name fol- 
lows the is keyword. 


In this context, the is operator tests whether a reference conversion would succeed; 
in other words, whether an object derives from a specified class (or implements an 
interface). It is often used to test before downcasting. 


if (a is Stock) 
Console.WriteLine (((Stock)a).SharesOwned) ; 


The is operator also evaluates to true if an unboxing conversion would succeed (see 
“The object Type” on page 116). However, it does not consider custom or numeric 
conversions. 


The is operator works with many other (somewhat less use- 
ful) kinds of patterns, introduced in C# 7 and C# 8. For a full 
discussion, see “Patterns” on page 201 in Chapter 4. 


Introducing a pattern variable 
You can introduce a variable while using the is operator: 


if (a is Stock s) 
Console.WriteLine (s.SharesOwned); 


This is equivalent to the following: 


Stock s; 
if (a is Stock) 
{ 
s = (Stock) a; 
Console.WriteLine (s.SharesOwned) ; 


} 


The variable that you introduce is available for “immediate” consumption, so the 
following is legal: 


if (a is Stock s && s.SharesOwned > 100000) 
Console.WriteLine ("Wealthy"); 


And it remains in scope outside the is-expression, allowing this: 


if (a is Stock s && s.SharesOwned > 100000) 
Console.WriteLine ("Wealthy"); 

Else 
s = new Stock(); // s is in scope 


Console.WriteLine (s.SharesOwned); // Still in scope 
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Virtual Function Members 


A function marked as virtual can be overridden by subclasses wanting to provide a 
specialized implementation. Methods, properties, indexers, and events can all be 
declared virtual: 


public class Asset 


{ 
public string Name; 
public virtual decimal Liability => 0;  // Expression-bodied property 


} 


(Liability => 0 is a shortcut for { get { return 0; } }. For more details on 





this syntax, see “Expression-bodied properties” on page 100.) 3 a 
A subclass overrides a virtual method by applying the override modifier: a g 
public class Stock : Asset 9 a 
; public long SharesOwned; 
I 


public class House : Asset 


{ 
public decimal Mortgage; 
public override decimal Liability => Mortgage; 


} 


By default, the Liability of an Asset is 0. A Stock does not need to specialize this 
behavior. However, the House specializes the Liability property to return the value 
of the Mortgage: 


House mansion = new House { Name="McMansion", Mortgage=250000 }; 
Asset a = mansion; 

Console.WriteLine (mansion.Liability); // 250000 
Console.WriteLine (a.Liability); // 250000 


The signatures, return types, and accessibility of the virtual and overridden methods 
must be identical. An overridden method can call its base class implementation via 
the base keyword (we cover this in “The base Keyword” on page 113). 


Calling virtual methods from a constructor is potentially dan- 
gerous because authors of subclasses are unlikely to know, 
when overriding the method, that they are working with a 
partially initialized object. In other words, the overriding 
method might end up accessing methods or properties that 
rely on fields not yet initialized by the constructor. 


Abstract Classes and Abstract Members 


A class declared as abstract can never be instantiated. Instead, only its concrete sub- 
classes can be instantiated. 
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Abstract classes are able to define abstract members. Abstract members are like vir- 
tual members except that they don’t provide a default implementation. That imple- 
mentation must be provided by the subclass unless that subclass is also declared 
abstract: 


public abstract class Asset 


{ 
// Note empty implementation 


public abstract decimal NetValue { get; } 
} 


public class Stock : Asset 


{ 
public long SharesOwned; 


public decimal CurrentPrice; 


// Override like a virtual method. 
public override decimal NetValue => CurrentPrice * SharesOwned; 


} 
Hiding Inherited Members 


A base class and a subclass can define identical members. For example: 


public class A { public int Counter = 1; } 
public class B: A { public int Counter = 2; } 


The Counter field in class B is said to hide the Counter field in class A. Usually, this 
happens by accident, when a member is added to the base type after an identical 
member was added to the subtype. For this reason, the compiler generates a warn- 
ing and then resolves the ambiguity as follows: 


e References to A (at compile time) bind to A. Counter 
e References to B (at compile time) bind to B. Counter 
Occasionally, you want to hide a member deliberately, in which case you can apply 


the new modifier to the member in the subclass. The new modifier does nothing more 
than suppress the compiler warning that would otherwise result: 


public class A { public int Counter = 1; } 
public class B : A { public new int Counter = 2; } 


The new modifier communicates your intent to the compiler—and other program- 
mers—that the duplicate member is not an accident. 


C# overloads the new keyword to have independent meanings 
in different contexts. Specifically, the new operator is different 
from the new member modifier. 
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new versus override 
Consider the following class hierarchy: 


public class BaseClass 


{ 


public virtual void Foo() { Console.WriteLine ("BaseClass.Foo"); } 


} 


public class Overrider : BaseClass 


{ 


public override void Foo() { Console.WriteLine ("Overrider.Foo"); } 


} 


public class Hider : BaseClass 3 fe) 

{ 26 

public new void Foo() { Console.WriteLine ("Hider.Foo"); } == 

i Pr 
4 





The differences in behavior between Overrider and Hider are demonstrated in the 
following code: 


Overrider over = new Overrider(); 

BaseClass b1 = over; 

over .Foo(); // Overrider.Foo 
b1.Foo(); // Overrider.Foo 


Hider h = new Hider(); 

BaseClass b2 = h; 

h.Foo(); // Hider .Foo 
b2.Foo(); // BaseClass.Foo 


Sealing Functions and Classes 


An overridden function member can seal its implementation with the sealed key- 
word to prevent it from being overridden by further subclasses. In our earlier vir- 
tual function member example, we could have sealed House’s implementation of 
Liability, preventing a class that derives from House from overriding Liability, 
as follows: 


public sealed override decimal Liability { get { return Mortgage; } } 


You can also seal the class itself, implicitly sealing all the virtual functions, by apply- 
ing the sealed modifier to the class itself. Sealing a class is more common than seal- 
ing a function member. 


Although you can seal against overriding, you can’t seal a member against being 
hidden. 


The base Keyword 


The base keyword is similar to the this keyword. It serves two essential purposes: 
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¢ Accessing an overridden function member from the subclass 


e Calling a base-class constructor (see the next section) 


In this example, House uses the base keyword to access Asset’s implementation of 
Liability: 


public class House : Asset 


{ 


public override decimal Liability => base.Liability + Mortgage; 
i 
With the base keyword, we access Asset’s Liability property nonvirtually. This 


means that we will always access Asset’s version of this property—regardless of the 
instance’s actual runtime type. 


The same approach works if Liability is hidden rather than overridden. (You can 
also access hidden members by casting to the base class before invoking the 
function.) 


Constructors and Inheritance 


A subclass must declare its own constructors. The base class’s constructors are acces- 
sible to the derived class but are never automatically inherited. For example, if we 
define Baseclass and Subclass as follows: 


public class Baseclass 


{ 
public int X; 
public Baseclass () { } 
public Baseclass (int x) { this.X = x; } 


} 


public class Subclass : Baseclass { } 
the following is illegal: 
Subclass s = new Subclass (123); 


Subclass must hence “redefine” any constructors it wants to expose. In doing so, 
however, it can call any of the base class’s constructors via the base keyword: 


public class Subclass : Baseclass 


{ 
public Subclass (int x) : base (x) { } 


} 


The base keyword works rather like the this keyword except that it calls a con- 
structor in the base class. 


Base-class constructors always execute first; this ensures that base initialization 
occurs before specialized initialization. 
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Implicit calling of the parameterless base-class constructor 


If a constructor in a subclass omits the base keyword, the base type’s parameterless 
constructor is implicitly called: 


public class BaseClass 


{ 
public int X; 
public BaseClass() { X = 1; } 


} 
public class Subclass : BaseClass 
{ 
public Subclass() { Console.WriteLine (X); } // 1 
J 


If the base class has no accessible parameterless constructor, subclasses are forced to 
use the base keyword in their constructors. 
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Constructor and field initialization order 
When an object is instantiated, initialization takes place in the following order: 


1. From subclass to base class: 
a. Fields are initialized 


b. Arguments to base-class constructor calls are evaluated 
2. From base class to subclass: 
a. Constructor bodies execute 
The following code demonstrates: 


public class B 


{ 
tnt x= 43 // Executes 3rd 
public B (int x) 
{ 
rer // Executes 4th 
} 
public class D: B 
{ 
tnt. ya ds // Executes 1st 


public D (int x) 
: base (x +1) // Executes 2nd 


{ 
// Executes 5th 


} 
} 


Overloading and Resolution 


Inheritance has an interesting impact on method overloading. Consider the follow- 
ing two overloads: 
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static void Foo (Asset a) { } 
static void Foo (House h) { } 


When an overload is called, the most specific type has precedence: 


House h = new House (...); 
Foo(h); // Calls Foo(House) 


The particular overload to call is determined statically (at compile time) rather than 
at runtime. The following code calls Foo(Asset), even though the runtime type of a 
is House: 


Asset a = new House (...)3 
Foo(a); // Calls Foo(Asset) 


If you cast Asset to dynamic (Chapter 4), the decision as to 
which overload to call is deferred until runtime and is then 
based on the object’s actual type: 


Asset a = new House (...); 
Foo ((dynamic)a); // Calls Foo(House) 


The object Type 


object (System.Object) is the ultimate base class for all types. Any type can be 
upcast to object. 


To illustrate how this is useful, consider a general-purpose stack. A stack is a data 
structure based on the principle of LIFO—Last-In First-Out. A stack has two opera- 
tions: push an object on the stack, and pop an object off the stack. Here is a simple 
implementation that can hold up to 10 objects: 


public class Stack 
{ 


int position; 

object[] data = new object[10]; 

public void Push (object obj) { data[position++] = obj; } 

public object Pop() { return data[--position]; } 
} 


Because Stack works with the object type, we can Push and Pop instances of any 
type to and from the Stack: 


Stack stack = new Stack(); 
stack.Push ("sausage"); 
string s = (string) stack.Pop(); // Downcast, so explicit cast is needed 


Console.WriteLine (s); // sausage 


object is a reference type, by virtue of being a class. Despite this, value types, such 
as int, can also be cast to and from object, and so be added to our stack. This fea- 
ture of C# is called type unification and is demonstrated here: 


stack.Push (3); 
int three = (int) stack.Pop(); 
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When you cast between a value type and object, the CLR must perform some spe- 
cial work to bridge the difference in semantics between value and reference types. 
This process is called boxing and unboxing. 


In “Generics” on page 135, we describe how to improve our 
Stack class to better handle stacks with same-typed elements. 


Boxing and Unboxing 


Boxing is the act of converting a value-type instance to a reference-type instance. 
The reference type can be either the object class or an interface (which we visit 


later in the chapter).' In this example, we box an int into an object: 3 re) 
2 

® © 

int x = 9s 2 =! 

: ae ee : 35 
object obj = x; // Box the int 98 





Unboxing reverses the operation by casting the object back to the original value 
type: 

int y = (int)obj; // Unbox the int 
Unboxing requires an explicit cast. The runtime checks that the stated value type 
matches the actual object type, and throws an InvalidCastException if the check 
fails. For instance, the following throws an exception because long does not exactly 
match int: 


object obj = 9; // 9 is inferred to be of type int 
long x = (long) obj; // InvalidCastException 


The following succeeds, however: 
object obj = 9; 
long x = (int) obj; 

As does this: 


object obj = 3.5; // 3.5 is inferred to be of type double 
int x = (int) (double) obj; // x is now 3 


In the last example, (double) performs an unboxing and then (int) performs a 
numeric conversion. 


Boxing conversions are crucial in providing a unified type sys- 
tem. The system is not perfect, however: we'll see in “Gener- 
ics” on page 135 that variance with arrays and generics 
supports only reference conversions and not boxing conversions: 


object[] al = new string[3]; // Legal 
object[] a2 = new int[3]; // Error 





1 The reference type can also be System. ValueType or System. Enum (Chapter 6). 
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Copying semantics of boxing and unboxing 


Boxing copies the value-type instance into the new object, and unboxing copies the 
contents of the object back into a value-type instance. In the following example, 
changing the value of i doesn’t change its previously boxed copy: 


int i = 3; 
object boxed = i; 
eS. 53 


Console.WriteLine (boxed); // 3 


Static and Runtime Type Checking 


C# programs are type-checked both statically (at compile time) and at runtime (by 
the CLR). 


Static type checking enables the compiler to verify the correctness of your program 
without running it. The following code will fail because the compiler enforces static 


typing: 
int x = "5"; 


Runtime type checking is performed by the CLR when you downcast via a reference 
conversion or unboxing: 


object y = "5"; 
int z = (int) y; // Runtime error, downcast failed 


Runtime type checking is possible because each object on the heap internally stores 
a little type token. You can retrieve this token by calling the GetType method of 
object. 
The GetType Method and typeof Operator 
All types in C# are represented at runtime with an instance of System. Type. There 
are two basic ways to get a System. Type object: 

e Call GetType on the instance 

e Use the typeof operator on a type name 
GetType is evaluated at runtime; typeof is evaluated statically at compile time 
(when generic type parameters are involved, it’s resolved by the JIT compiler). 


System. Type has properties for such things as the type’s name, assembly, base type, 
and so on: 


using System; 
public class Point { public int X, Y; } 
class Test 


{ 


static void Main() 
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Point p = new Point(); 


Console.WriteLine (p.GetType().Name) ; // Point 

Console.WriteLine (typeof (Point) .Name) ; // Point 

Console.WriteLine (p.GetType() == typeof(Point)); // True 

Console.WriteLine (p.X.GetType().Name) ; // Int32 

Console.WriteLine (p.Y.GetType().FullName) ; // System. Int32 
} 


} 


System.Type also has methods that act as a gateway to the runtime’s reflection 


model, described in Chapter 19. 


The ToString Method 


The ToString method returns the default textual representation of a type instance. 
This method is overridden by all built-in types. Here is an example of using the int 


type’s ToString method: 


Wit x= 1; 
string s = x.ToString(); // sis "1" 


You can override the ToString method on custom types as follows: 


public class Panda 


{ 


public string Name; 
public override string ToString() => Name; 


} 


Panda p = new Panda { Name = "Petey" }; 
Console.WriteLine (p); // Petey 


If you dont override ToString, the method returns the type name. 
When you call an overridden object member such as 


ToString directly on a value type, boxing doesn’t occur. Box- 
ing then occurs only if you cast: 


int x = 1; 

string s1 = x.ToString(); // Calling on nonboxed value 
object box = x; 

string s2 = box.ToString(); // Calling on boxed value 


Object Member Listing 
Here are all the members of object: 


public class Object 
{ 
public Object(); 
public extern Type GetType(); 


public virtual bool Equals (object obj); 
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public static bool Equals (object objA, object objB); 
public static bool ReferenceEquals (object objA, object objB); 


public virtual int GetHashCode(); 
public virtual string ToString(); 


protected virtual void Finalize(); 
protected extern object MemberwiseClone(); 


} 


We describe the Equals, ReferenceEquals, and GetHashCode methods in “Equality 
Comparison” on page 296 in Chapter 6. 


Structs 


A struct is similar to a class, with the following key differences: 


¢ A struct is a value type, whereas a class is a reference type. 


e A struct does not support inheritance (other than implicitly deriving from 
object, or more precisely, System. ValueType). 


A struct can have all of the members that a class can, except the following: 


e A parameterless constructor 
e Field initializers 
e A finalizer 


¢ Virtual or protected members 


A struct is appropriate when value-type semantics are desirable. Good examples of 
structs are numeric types, where it is more natural for assignment to copy a value 
rather than a reference. Because a struct is a value type, each instance does not 
require instantiation of an object on the heap; this results in a useful savings when 
creating many instances of a type. For instance, creating an array of value type 
requires only a single heap allocation. 


Because structs are value types, an instance cannot be null. The default value for a 
struct is an empty instance, with all fields empty (set to their default values). 
Struct Construction Semantics 


The construction semantics of a struct are as follows: 


e A parameterless constructor that you can't override implicitly exists. This per- 
forms a bitwise zeroing of its fields (setting them to their default values). 


e When you define a struct constructor, you must explicitly assign every field. 
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(And you can't have field initializers.) Here is an example of declaring and calling 
struct constructors: 


public struct Point 


{ 

int x, y; 

public Point (int x, int y) { this.x = x; this.y = y; } 
} 
Point p1 = new Point (); // p1.x and p1.y will be 0 


Point p2 = new Point (1, 1); // p2.x and p2.y will be 1 


The default keyword, when applied to a struct, does the same job as its implicit 
parameterless constructor: 


Point p1 = default; 
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This can serve as a convenient shortcut when calling methods: 





void Foo (Point p) { ... } 


Foo (default); // Equivalent to Foo (new Point()); 
The next example generates three compile-time errors: 


public struct Point 


{ 
tnt: x= // Illegal: field initializer 
int y; 
public Point() {} // Illegal: parameterless constructor 


public Point (int x) {this.x = x;} // Illegal: must assign field y 
} 


Changing struct to class makes this example legal. 


Read-only Structs and Functions 


From C# 7.2, you can apply the readonly modifier to a struct to enforce that all 
fields are readonly; this aids in declaring intent as well as allowing the compiler 
more optimization freedom: 


readonly struct Point 


{ 
public readonly int X, Y; // X and Y must be readonly 


} 


If you need to apply readonly at a more granular level, C# 8 assists with a new fea- 
ture whereby you can apply the readonly modifier to a struct’s functions. This 
ensures that if the function attempts to modify any field, a compile-time error is 
generated: 


struct Point 


{ 
public int X, Y; 
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public readonly void ResetX() => X = 0; // Error! 
} 


If a readonly function calls a non-readonly function, the compiler generates a 
warning (and defensively copies the struct to avoid the possibility of a mutation). 


Ref Structs 


Ref structs were introduced in C# 7.2 as a niche feature pri- 
marily for the benefit of the Span<T> and ReadOnly Span<T> 
structs that we describe in Chapter 24 (and the highly opti- 
mized Utf8JsonReader that we describe in Chapter 11). These 
structs help with a micro-optimization technique that aims to 
reduce memory allocations. 


Unlike reference types, whose instances always live on the heap, value types live in- 
place (wherever the variable was declared). If a value type appears as a parameter or 
local variable, it will reside on the stack: 


struct Point { public int X, Y; } 


void SomeMethod() 


{ 
Point p;  // p will reside on the stack 


} 


But if a value type appears as a field in a class, it will reside on the heap: 


class MyClass 
{ 


Point p; // Lives on heap, because MyClass instances live on the heap 


} 


Similarly, arrays of structs live on the heap, and boxing a struct sends it to the heap. 


From C# 7.2, you can add the ref modifier to a struct’s declaration to ensure that it 
can only ever reside on the stack. Attempting to use a ref struct in such a way that it 
could reside on the heap generates a compile-time error: 


ref struct Point { public int X, Y; } 
class MyClass { Point P; } = // Error: will not compile! 


var points = new Point [100]; // Error: will not compile! 


Ref structs were introduced mainly for the benefit of the Span<T> and ReadOnly 
Span<T> structs. Because Span<T> and ReadOnlySpan<T> instances can exist only on 
the stack, it’s possible for them to safely wrap stack-allocated memory. 


Ref structs cannot partake in any C# feature that directly or indirectly introduces 
the possibility of existing on the heap. This includes a number of advanced C# fea- 
tures that we describe in Chapter 4, namely lambda expressions, iterators, and asyn- 
chronous functions (because, behind the scenes, these features all create hidden 
classes with fields). Also, ref structs cannot appear inside non-ref structs, and they 
cannot implement interfaces (because this could result in boxing). 
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Access Modifiers 


To promote encapsulation, a type or type member can limit its accessibility to other 
types and other assemblies by adding one of six access modifiers to the declaration: 


public 
Fully accessible. This is the implicit accessibility for members of an enum or 
interface. 


internal 
Accessible only within the containing assembly or friend assemblies. This is the 
default accessibility for non-nested types. 


private 
Accessible only within the containing type. This is the default accessibility for 
members of a class or struct. 


protected 
Accessible only within the containing type or subclasses. 


protected internal 
The union of protected and internal accessibility. A member that is 
protected internal is accessible in two ways. 


private protected (from C# 7.2) 
The intersection of protected and internal accessibility. A member that is 
private protected is accessible only within the containing type, or subclasses 
that reside in the same assembly (making it less accessible than protected or 
internal alone). 


Examples 
Class2 is accessible from outside its assembly; Class1 is not: 


class Class1 {} // Classi is internal (default) 
public class Class2 {} 


ClassB exposes field x to other types in the same assembly; ClassA does not: 


class ClassA { int x; } // x is private (default) 
class ClassB { internal int x; } 


Functions within Subclass can call Bar but not Foo: 


class BaseClass 


{ 
void Foo() {} // Foo is private (default) 
protected void Bar() {} 

} 

class Subclass : BaseClass 

{ 
void Testi() { Foo(); } // Error - cannot access Foo 
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void Test2() { Bar(); } // OK 
} 


Friend Assemblies 


You can expose internal members to other friend assemblies by adding the 
System.Runtime.CompilerServices.InternalsVisibleTo assembly attribute, 
specifying the name of the friend assembly as follows: 


[assembly: InternalsVisibleTo ("Friend") ] 


If the friend assembly has a strong name (see Chapter 18), you must specify its full 
160-byte public key: 


[assembly: InternalsVisibleTo ("StrongFriend, Publickey=0024f000048c...")] 


You can extract the full public key from a strongly named assembly with a LINQ 
query (we explain LINQ in detail in Chapter 8): 


string key = string.Join ("", 
Assembly.GetExecutingAssembly().GetName().GetPublicKkey() 
.Select (b => b.ToString ("x2"))); 


The companion sample in LINQPad invites you to browse to 
an assembly and then copies the assembly’s full public key to 
the clipboard. 


Accessibility Capping 


A type caps the accessibility of its declared members. The most common example of 
capping is when you have an internal type with public members. For example, 
consider this: 


class C { public void Foo() {} } 


C’s (default) internal accessibility caps Foo’s accessibility, effectively making Foo 
internal. A common reason Foo would be marked public is to make for easier 
refactoring should C later be changed to public. 


Restrictions on Access Modifiers 


When overriding a base class function, accessibility must be identical on the over- 
ridden function; for example: 


class BaseClass { protected virtual void Foo() {} } 
class Subclass1 : BaseClass { protected override void Foo() {} } // OK 
class Subclass2 : BaseClass { public override void Foo() {} } // Error 


(An exception is when overriding a protected internal method in another assem- 
bly, in which case the override must simply be protected.) 


The compiler prevents any inconsistent use of access modifiers. For example, a sub- 
class itself can be less accessible than a base class, but not more: 
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internal class A {} 
public class B: A {} // Error 


Interfaces 


An interface is similar to a class, but only specifies behavior and does not hold state 
(data). Consequently: 


¢ An interface can define only functions and not fields. 


¢ Interface members are implicitly abstract. (Although nonabstract functions are 
permitted from C# 8, this is considered a special case, which we describe in 
“Default Interface Members (C# 8)” on page 129.) 


¢ A class (or struct) can implement multiple interfaces. In contrast, a class can 
inherit from only a single class, and a struct cannot inherit at all (aside from 
deriving from System.ValueType). 
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An interface declaration is like a class declaration, but it (typically) provides no 
implementation for its members because its members are implicitly abstract. These 
members will be implemented by the classes and structs that implement the inter- 
face. An interface can contain only functions; that is, methods, properties, events, 
and indexers (which, noncoincidentally, are precisely the members of a class that 
can be abstract). 


Here is the definition of the IEnumerator interface, defined in System 
.Collections: 


public interface IEnumerator 


{ 
bool MoveNext(); 


object Current { get; } 
void Reset(); 


} 


Interface members are always implicitly public and cannot declare an access modi- 
fier. Implementing an interface means providing a public implementation for all of 
its members: 


internal class Countdown : IEnumerator 
{ 
int count = 11; 
public bool MoveNext() => count-- > 0; 
public object Current => count; 
public void Reset() { throw new NotSupportedException(); } 


} 


You can implicitly cast an object to any interface that it implements: 


IEnumerator e = new Countdown(); 
while (e.MoveNext()) 
Console.Write (e.Current); // 109876543210 
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Even though Countdown is an internal class, its members that 
implement IEnumerator can be called publicly by casting an 
instance of Countdown to IEnumerator. For instance, if a pub- 
lic type in the same assembly defined a method as follows: 


public static class Util 
{ 


public static object GetCountDown() => new CountDown(); 


} 
a caller from another assembly could do this: 


IEnumerator e = (IEnumerator) Util.GetCountDown(); 
e.MoveNext(); 


If IEnumerator was itself defined as internal, this wouldnt 


be possible. 


Extending an Interface 
Interfaces can derive from other interfaces; for instance: 


public interface IUndoable { void Undo(); } 
public interface IRedoable : IUndoable { void Redo(); } 


IRedoable “inherits” all the members of IUndoable. In other words, types that 
implement [Redoable must also implement the members of IUndoable. 


Explicit Interface Implementation 


Implementing multiple interfaces can sometimes result in a collision between mem- 
ber signatures. You can resolve such collisions by explicitly implementing an inter- 
face member. Consider the following example: 


interface I1 { void Foo(); } 
interface 12 { int Foo(); } 


public class Widget : I1, 12 


{ 
public void Foo() 


{ 


Console.WriteLine ("Widget's implementation of I1.Foo"); 


} 


int 12.Foo() 
{ 


Console.WriteLine ("Widget's implementation of I2.Foo"); 
return 42; 
} 
} 


Because I1 and 12 have conflicting Foo signatures, Widget explicitly implements 12’s 
Foo method. This lets the two methods coexist in one class. The only way to call an 
explicitly implemented member is to cast to its interface: 


Widget w = new Widget(); 
w.Foo(); // Widget's implementation of I1.Foo 
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((11)w).Foo(); // Widget's implementation of I1.Foo 
((12)w).Foo(); // Widget's implementation of 12.Foo 


Another reason to explicitly implement interface members is to hide members that 
are highly specialized and distracting to a type’s normal use case. For example, a 
type that implements ISerializable would typically want to avoid flaunting its 
ISerializable members unless explicitly cast to that interface. 


Implementing Interface Members Virtually 


An implicitly implemented interface member is, by default, sealed. It must be 
marked virtual or abstract in the base class in order to be overridden: 


public interface IUndoable { void Undo(); } 


public class TextBox : IUndoable 
{ 


public virtual void Undo() => Console.WriteLine ("TextBox.Undo"); 


} 
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public class RichTextBox : TextBox 


t 


public override void Undo() => Console.WriteLine ("RichTextBox.Undo"); 


} 


Calling the interface member through either the base class or the interface calls the 
subclass’s implementation: 


RichTextBox r = new RichTextBox(); 


r.Undo(); // RichTextBox.Undo 
((IUndoable)r).Undo(); // RichTextBox.Undo 
((TextBox)r).Undo(); // RichTextBox.Undo 


An explicitly implemented interface member cannot be marked virtual, nor can it 
be overridden in the usual manner. It can, however, be reimplemented. 


Reimplementing an Interface in a Subclass 


A subclass can reimplement any interface member already implemented by a base 
class. Reimplementation hijacks a member implementation (when called through 
the interface) and works whether or not the member is virtual in the base class. It 
also works whether a member is implemented implicitly or explicitly—although it 
works best in the latter case, as we will demonstrate. 


In the following example, TextBox implements IUndoable.Undo explicitly, and so it 
cannot be marked as virtual. To “override” it, RichTextBox must reimplement 
IUndoable’s Undo method: 


public interface IUndoable { void Undo(); } 
public class TextBox : IUndoable 


{ 
void IUndoable.Undo() => Console.WriteLine ("TextBox.Undo"); 
} 
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public class RichTextBox : TextBox, IUndoable 
{ 


public void Undo() => Console.WriteLine ("RichTextBox.Undo"); 


} 


Calling the reimplemented member through the interface calls the subclass’s 
implementation: 


RichTextBox r = new RichTextBox(); 
r.Undo(); // RichTextBox.Undo Case 1 
((IUndoable)r).Undo(); // RichTextBox.Undo Case 2 


Assuming the same RichTextBox definition, suppose that TextBox implemented 
Undo implicitly: 


public class TextBox : IUndoable 
{ 


public void Undo() => Console.WriteLine ("TextBox.Undo") ; 


} 


This would give us another way to call Undo, which would “break” the system, as 
shown in Case 3: 


RichTextBox r = new RichTextBox(); 


r.Undo(); // RichTextBox.Undo Case 1 
((IUndoable)r).Undo(); // RichTextBox.Undo Case 2 
((TextBox)r).Undo(); // TextBox.Undo Case 3 


Case 3 demonstrates that reimplementation hijacking is effective only when a mem- 
ber is called through the interface and not through the base class. This is usually 
undesirable in that it can create inconsistent semantics. This makes reimplementa- 
tion most appropriate as a strategy for overriding explicitly implemented interface 
members. 


Alternatives to interface reimplementation 
Even with explicit member implementation, interface reimplementation is problem- 
atic for a couple of reasons: 

¢ The subclass has no way to call the base class method. 


¢ The base class author might not anticipate that a method be reimplemented 
and might not allow for the potential consequences. 


Reimplementation can be a good last resort when subclassing hasn't been anticipa- 
ted. A better option, however, is to design a base class such that reimplementation 
will never be required. There are two ways to achieve this: 


¢ When implicitly implementing a member, mark it virtual if appropriate. 


e When explicitly implementing a member, use the following pattern if you 
anticipate that subclasses might need to override any logic: 





128 | Chapter 3: Creating Types in C# 


public class TextBox : IUndoable 


{ 
void IUndoable.Undo() => Undo(); // Calls method below 


protected virtual void Undo() => Console.WriteLine ("TextBox.Undo"); 


} 


public class RichTextBox : TextBox 


{ 


protected override void Undo() => Console.WriteLine("RichTextBox.Undo"); 


} 


If you don’ anticipate any subclassing, you can mark the class as sealed to preempt 
interface reimplementation. 


Interfaces and Boxing 


Converting a struct to an interface causes boxing. Calling an implicitly implemented 
member on a struct does not cause boxing: 


interface I { void Foo(); } 
struct S : I { public void Foo() {} } 


S s = new S(); 


s.Foo(); // No boxing. 
TiS 6% // Box occurs when casting to interface. 
i.Foo(); 


Default Interface Members (C# 8) 


From C# 8, you can add a default implementation to an interface member, making 
it optional to implement: 


interface ILogger 


{ 


void Log (string text) => Console.WriteLine (text); 


} 


This is advantageous if you want to add a member to an interface defined in a popu- 
lar library without breaking (potentially thousands of) implementations. 


Default implementations are always explicit, so if a class implementing ILogger fails 
to define a Log method, the only way to call it is through the interface: 


class Logger : ILogger { } 


((ILogger)new Logger()).Log ("message"); 


This prevents a problem of multiple implementation inheritance: if the same default 
member is added to two interfaces that a class implements, there is never an ambi- 
guity as to which member is called. 


Interfaces can also now define static members (including fields), which can be 
accessed from code inside default implementations: 
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interface ILogger 


{ 
void Log (string text) => 
Console.WriteLine (Prefix + text); 
static string Prefix = ""; 
} 


Because interface members are implicitly public, you can also access static members 
from the outside: 


ILogger.Prefix = "File log: "; 


You can restrict this by adding an accessibility modifier to the static interface mem- 
ber (such as private, protected, or internal). 





Writing a Class Versus an Interface 


¢ Use classes and subclasses for types that naturally share an implementation. 


¢ Use interfaces for types that have independent implementations. 


As a guideline: 
Consider the following classes: 


abstract class Animal {} 


abstract class Bird : Animal {} 
abstract class Insect : Animal {} 
abstract class FlyingCreature : Animal {} 
abstract class Carnivore : Animal {} 


// Concrete classes: 


class Ostrich : Bird {} 


class Eagle: Bird, FlyingCreature, Carnivore {} // Illegal 
class Bee : Insect, FlyingCreature {} // Illegal 
class Flea : Insect, Carnivore {} // Illegal 


The Eagle, Bee, and Flea classes do not compile because inheriting from multiple 
classes is prohibited. To resolve this, we must convert some of the types to inter- 
faces. The question then arises, which types? Following our general rule, we could 
say that insects share an implementation, and birds share an implementation, so 
they remain classes. In contrast, flying creatures have independent mechanisms for 
flying, and carnivores have independent strategies for eating animals, so we would 
convert FlyingCreature and Carnivore to interfaces: 


interface IFlyingCreature {} 
interface ICarnivore {} 


In a typical scenario, Bird and Insect might correspond to a Windows control and a 
web control; FlyingCreature and Carnivore might correspond to IPrintable and 
IUndoable. 
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Instance fields are (still) prohibited. This is in line with the principle of interfaces, 
which is to define behavior, not state. 


Enums 


An enum is a special value type that lets you specify a group of named numeric con- 
stants. For example: 


public enum BorderSide { Left, Right, Top, Bottom } 
We can use this enum type as follows: 


BorderSide topSide = BorderSide. Top; 
bool isTop = (topSide == BorderSide.Top); // true 


Each enum member has an underlying integral value. These are, by default: 
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¢ Underlying values are of type int. 


¢ The constants 0, 1, 2... are automatically assigned, in the declaration order of 
the enum members. 


You can specify an alternative integral type, as follows: 
public enum BorderSide : byte { Left, Right, Top, Bottom } 

You can also specify an explicit underlying value for each enum member: 
public enum BorderSide : byte { Left=1, Right=2, Top=10, Bottom=11 } 


The compiler also lets you explicitly assign some of the enum 
members. The unassigned enum members keep incrementing 
from the last explicit value. The preceding example is equiva- 
lent to the following: 


public enum BorderSide : byte 
{ Left=1, Right, Top=10, Bottom } 


Enum Conversions 


You can convert an enum instance to and from its underlying integral value with an 
explicit cast: 


int i = (int) BorderSide.Left; 
BorderSide side = (BorderSide) i; 
bool leftOrRight = (int) side <= 2; 


You can also explicitly cast one enum type to another. Suppose that Horizontal 
Alignment is defined as follows: 


public enum HorizontalAlignment 
{ 
Left = BorderSide.Left, 
Right = BorderSide.Right, 
Center 


} 
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A translation between the enum types uses the underlying integral values: 


HorizontalAlignment h = (HorizontalAlignment) BorderSide.Right; 
// same as: 
HorizontalAlignment h = (HorizontalAlignment) (int) BorderSide.Right; 


The numeric literal 0 is treated specially by the compiler in an enum expression and 
does not require an explicit cast: 


BorderSide b = 0; // No cast required 
if (b == 0)... 


There are two reasons for the special treatment of 0: 


¢ The first member of an enum is often used as the default value. 


¢ For combined enum types, 0 means no flags. 


Flags Enums 


You can combine enum members. To prevent ambiguities, members of a combina- 
ble enum require explicitly assigned values, typically in powers of two: 


[Flags] 
enum BorderSides { None=0, Left=1, Right=2, Top=4, Bottom=8 } 


or: 
enum BorderSides { None=0, Left=1, Right=1<<1, Top=1<<2, Bottom=1<<3 } 


To work with combined enum values, you use bitwise operators such as | and &. 
These operate on the underlying integral values: 


BorderSides leftRight = BorderSides.Left | BorderSides.Right; 


if ((leftRight & BorderSides.Left) != 0) 
Console.WriteLine ("Includes Left"); // Includes Left 


string formatted = leftRight.ToString(); // "Left, Right" 


BorderSides s = BorderSides.Left; 
s |= BorderSides.Right; 
Console.WriteLine (s == leftRight); // True 


s “= BorderSides.Right; // Toggles BorderSides.Right 
Console.WriteLine (s); // Left 


By convention, the Flags attribute should always be applied to an enum type when 
its members are combinable. If you declare such an enum without the Flags 
attribute, you can still combine members, but calling ToString on an enum instance 
will emit a number rather than a series of names. 


By convention, a combinable enum type is given a plural rather than singular name. 
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For convenience, you can include combination members within an enum declara- 
tion itself: 


[Flags] 
enum BorderSides 
{ 
None=0, 
Left=1, Right=1<<1, Top=1<<2, Bottom=1<<3, 
LeftRight = Left | Right, 
TopBottom = Top | Bottom, 


ALL = LeftRight | TopBottom 
I 
Enum Operators Sis 
Or 
; ® 
The operators that work with enums are: a 3 
35 
= = != << > <= >= + - A” & | ~ Qe 
. c 
+= -= +40 =~ Sizeof 





The bitwise, arithmetic, and comparison operators return the result of processing 
the underlying integral values. Addition is permitted between an enum and an inte- 
gral type, but not between two enums. 


Type-Safety Issues 
Consider the following enum: 
public enum BorderSide { Left, Right, Top, Bottom } 


Because an enum can be cast to and from its underlying integral type, the actual 
value it can have might fall outside the bounds of a legal enum member: 


BorderSide b = (BorderSide) 12345; 
Console.WriteLine (b); // 12345 


The bitwise and arithmetic operators can produce similarly invalid values: 


BorderSide b = BorderSide.Bottom; 
b++; // No errors 


An invalid BorderSide would break the following code: 


void Draw (BorderSide side) 

{ 
if (side == BorderSide.Left) {...} 
else if (side == BorderSide.Right) {...} 
else if (side == BorderSide.Top) {...} 
else {...} // Assume BorderSide.Bottom 


} 


One solution is to add another else clause: 


else if (side == BorderSide.Bottom) ... 
else throw new ArgumentException ("Invalid BorderSide: 


+ side, "side"); 
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Another workaround is to explicitly check an enum value for validity. The static 
Enum. IsDefined method does this job: 


BorderSide side = (BorderSide) 12345; 
Console.WriteLine (Enum.IsDefined (typeof (BorderSide), side)); // False 


Unfortunately, Enum. IsDefined does not work for flagged enums. However, the fol- 
lowing helper method (a trick dependent on the behavior of Enum.ToString()) 
returns true if a given flagged enum is valid: 


static bool IsFlagDefined (Enum e) 
{ 
decimal d; 
return !decimal.TryParse(e.ToString(), out d); 


} 


[Flags] 
public enum BorderSides { Left=1, Right=2, Top=4, Bottom=8 } 


static void Main() 
{ 
for (int i = 0; i <= 16; i++) 
: BorderSides side = (BorderSides)i; 
Console.WriteLine (IsFlagDefined (side) + 
} 
} 


wou 


+ side); 


Nested Types 


A nested type is declared within the scope of another type: 


public class TopLevel 


{ 
public class Nested { } // Nested class 


public enum Color { Red, Blue, Tan } // Nested enum 
} 


A nested type has the following features: 
e It can access the enclosing type’s private members and everything else the 
enclosing type can access. 


e You can declare it with the full range of access modifiers rather than just 
public and internal. 


¢ The default accessibility for a nested type is private rather than internal. 


Accessing a nested type from outside the enclosing type requires qualification 
with the enclosing type’s name (like when accessing static members). 


For example, to access Color .Red from outside our TopLevel class, wed need to do 
this: 


TopLevel.Color color = TopLevel.Color.Red; 
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All types (classes, structs, interfaces, delegates, and enums) can be nested within 
either a class or a struct. 


Here is an example of accessing a private member of a type from a nested type: 


public class TopLevel 


{ 
static int x; 
class Nested 





{ 
static void Foo() { Console.WriteLine (TopLevel.x); } 
} 
} 
. . C >! 
Here is an example of applying the protected access modifier to a nested type: $9 
om) 
public class TopLevel oo 
{ ae 
aa 
protected class Nested { } it 
} 
public class SubTopLevel : TopLevel 
{ 
static void Foo() { new TopLevel.Nested(); } 
} 


Here is an example of referring to a nested type from outside the enclosing type: 


public class TopLevel 


{ 
public class Nested { } 


} 


class Test 


{ 
TopLevel.Nested n; 


} 


Nested types are used heavily by the compiler itself when it generates private classes 
that capture state for constructs such as iterators and anonymous methods. 


If the sole reason for using a nested type is to avoid cluttering 
a namespace with too many types, consider using a nested 
namespace, instead. A nested type should be used because of 
its stronger access control restrictions, or when the nested 
class must access private members of the containing class. 


Generics 


C# has two separate mechanisms for writing code that is reusable across different 
types: inheritance and generics. Whereas inheritance expresses reusability with a 
base type, generics express reusability with a template that contains placeholder 
types. Generics, when compared to inheritance, can increase type safety and reduce 
casting and boxing. 
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C# generics and C++ templates are similar concepts, but they 
work differently. We explain this difference in “C# Generics 
Versus C++ Templates” on page 147. 


Generic Types 


A generic type declares type parameters—placeholder types to be filled in by the 
consumer of the generic type, which supplies the type arguments. Here is a generic 
type Stack<T>, designed to stack instances of type T. Stack<T> declares a single type 
parameter T: 


public class Stack<T> 
{ 


int position; 
T[] data = new T[100]; 
public void Push (T obj) => data[position++] = obj; 
public T Pop() => data[--position]; 
} 


We can use Stack<T> as follows: 


var stack = new Stack<int>(); 
stack.Push (5); 

stack.Push (10); 

int x = stack.Pop(); // x is 10 
int y = stack.Pop(); /i yis 5 


Stack<int> fills in the type parameter T with the type argument int, implicitly cre- 
ating a type on the fly (the synthesis occurs at runtime). Attempting to push a string 
onto our Stack<int> would, however, produce a compile-time error. Stack<int> 
effectively has the following definition (substitutions appear in bold, with the class 
name hashed out to avoid confusion): 


public class ### 
{ 


int position; 
int[] data = new int[100]; 
public void Push (int obj) => data[position++] = obj; 
public int Pop() => data[--position]; 
} 


Technically, we say that Stack<T> is an open type, whereas Stack<int> is a closed 
type. At runtime, all generic type instances are closed—with the placeholder types 
filled in. This means that the following statement is illegal: 


var stack = new Stack<T>(); // Illegal: What is T? 
unless it’s within a class or method that itself defines T as a type parameter: 


public class Stack<T> 
{ 


public Stack<T> Clone() 


{ 
Stack<T> clone = new Stack<T>(); // Legal 
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— 
} 


Why Generics Exist 


Generics exist to write code that is reusable across different types. Suppose that we 
needed a stack of integers, but we didn't have generic types. One solution would be 
to hardcode a separate version of the class for every required element type (e.g., 
IntStack, StringStack, etc.). Clearly, this would cause considerable code duplica- 
tion. Another solution would be to write a stack that is generalized by using object 
as the element type: 


public class ObjectStack 
{ 


int position; 
object[] data = new object[10]; 
public void Push (object obj) => data[position++] = obj; 
public object Pop() => data[--position]; 
} 


An ObjectStack, however, wouldn't work as well as a hardcoded IntStack for 


specifically stacking integers. An ObjectStack would require boxing and downcast- 
ing that could not be checked at compile time: 


// Suppose we just want to store integers here: 
ObjectStack stack = new ObjectStack(); 


stack.Push ("s"); // Wrong type, but no error! 

int i = (int)stack.Pop(); // Downcast - runtime error 
What we need is both a general implementation of a stack that works for all element 
types as well as a way to easily specialize that stack to a specific element type for 
increased type safety and reduced casting and boxing. Generics give us precisely this 
by allowing us to parameterize the element type. Stack<T> has the benefits of both 
ObjectStack and IntStack. Like ObjectStack, Stack<T> is written once to work 
generally across all types. Like IntStack, Stack<T> is specialized for a particular 
type—the beauty is that this type is T, which we substitute on the fly. 


ObjectStack is functionally equivalent to Stack<object>. 


Generic Methods 
A generic method declares type parameters within the signature of a method. 


With generic methods, many fundamental algorithms can be implemented in a 
general-purpose way. Here is a generic method that swaps the contents of two vari- 
ables of any type T: 


static void Swap<T> (ref T a, ref T b) 
{ 
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T temp = a; 
a= bs 
b = temp; 

} 


Swap<T> is called as follows: 

int x = 5; 

int y = 10; 

Swap (ref x, ref y); 
Generally, there is no need to supply type arguments to a generic method, because 
the compiler can implicitly infer the type. If there is ambiguity, generic methods can 
be called with type arguments as follows: 


Swap<int> (ref x, ref y); 


Within a generic type, a method is not classed as generic unless it introduces type 
parameters (with the angle bracket syntax). The Pop method in our generic stack 
merely uses the type’s existing type parameter, T, and is not classed as a generic 
method. 


Methods and types are the only constructs that can introduce type parameters. 
Properties, indexers, events, fields, constructors, operators, and so on cannot 
declare type parameters, although they can partake in any type parameters already 
declared by their enclosing type. In our generic stack example, for instance, we 
could write an indexer that returns a generic item: 
public T this [int index] => data [index]; 

Similarly, constructors can partake in existing type parameters, but not introduce 
them: 


public Stack<T>() { } // Illegal 


Declaring Type Parameters 


Type parameters can be introduced in the declaration of classes, structs, interfaces, 
delegates (covered in Chapter 4), and methods. Other constructs, such as proper- 
ties, cannot introduce a type parameter, but they can use one. For example, the prop- 
erty Value uses T: 


public struct Nullable<T> 


{ 
public T Value { get; } 


} 


A generic type or method can have multiple parameters: 
class Dictionary<TKey, TValue> {...} 
To instantiate: 
Dictionary<int,string> myDict = new Dictionary<int,string>(); 


Or: 
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var myDict = new Dictionary<int,string>(); 


Generic type names and method names can be overloaded as long as the number of 
type parameters is different. For example, the following three type names do not 
conflict: 


class A {3 
class A<T> {} 
class A<T1,T2> {} 


By convention, generic types and methods with a single type 
parameter typically name their parameter T, as long as the 
intent of the parameter is clear. When using multiple type 
parameters, each parameter is prefixed with T, but has a more 
descriptive name. 
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typeof and Unbound Generic Types 


Open generic types do not exist at runtime: open generic types are closed as part of 
compilation. However, it is possible for an unbound generic type to exist at runtime 
—purely as a Type object. The only way to specify an unbound generic type in C# is 
via the typeof operator: 





class A<T> {} 
class A<T1,T2> {} 


typeof (A<>); // Unbound type (notice no type arguments). 
typeof (A<,>); // Use commas to indicate multiple type args. 


Type ai 
Type a2 


Open generic types are used in conjunction with the Reflection API (Chapter 19). 
You can also use the typeof operator to specify a closed type: 

Type a3 = typeof (A<int,int>); 
Or, you can specify an open type (which is closed at runtime): 


class B<T> { void X() { Type t = typeof (T); } } 


The default Generic Value 


You can use the default keyword to get the default value for a generic type parame- 
ter. The default value for a reference type is null, and the default value for a value 
type is the result of bitwise-zeroing the value type’s fields: 


static void Zap<T> (T[] array) 
{ 


for (int i = 0; i < array.Length; i++) 
array[i] = default(T); 


From C# 7.1, you can omit the type argument for cases in which the compiler is able 
to infer it. We could replace the last line of code with this: 


array[i] = default; 
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Generic Constraints 


By default, you can substitute a type parameter with any type whatsoever. Con- 
straints can be applied to a type parameter to require more specific type arguments. 
These are the possible constraints: 


where 7: base-class  // Base-class constraint 

where 7 : interface // Interface constraint 

where 7 : class // Reference-type constraint 

where 7 : class? // (See "Nullable reference types") 

where 7 : struct // Value-type constraint (excludes Nullable types) 
where 7 : unmanaged // Unmanaged constraint 

where 7 : new() // Parameterless constructor constraint 

where U: T // Naked type constraint 

where 7 : notnull // Non-nullable value type, or from C# 8 


// a non-nullable reference type. 


In the following example, GenericClass<T ,U> requires T to derive from (or be iden- 
tical to) SomeClass and implement Inter face1, and requires U to provide a parame- 
terless constructor: 


class SomeClass {} 
interface Interface1 {} 


class GenericClass<T,U> where T : SomeClass, Interface1 
where U : new() 


Lene} 


You can apply constraints wherever type parameters are defined, in both methods 
and type definitions. 


A base-class constraint specifies that the type parameter must subclass (or match) a 
particular class; an interface constraint specifies that the type parameter must imple- 
ment that interface. These constraints allow instances of the type parameter to be 
implicitly converted to that class or interface. For example, suppose that we want to 
write a generic Max method, which returns the maximum of two values. We can take 
advantage of the generic interface defined in the framework called IComparable<T>: 


public interface IComparable<T> // Simplified version of interface 


{ 
int CompareTo (T other); 


} 


CompareTo returns a positive number if this is greater than other. Using this inter- 
face as a constraint, we can write a Max method as follows (to avoid distraction, null 
checking is omitted): 


static T Max <T> (T a, T b) where T : IComparable<T> 
{ 


return a.CompareTo (b) > 0? a: b; 


} 


The Max method can accept arguments of any type implementing IComparable<T> 
(which includes most built-in types, such as int and string): 
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int z = Max (5, 10); // 10 

string last = Max ("ant", "zoo"); // zoo 
The class constraint and struct constraint specify that T must be a reference type or 
(non-nullable) value type. A great example of the struct constraint is the 
System.Nullable<T> struct (we discuss this class in depth in “Nullable Value Types” 
on page 185 in Chapter 4): 


struct Nullable<T> where T : struct {...} 


The unmanaged constraint (introduced in C# 7.3) is a stronger version of a struct 
constraint: T must be a simple value type or a struct that is (recursively) free of any 
reference types. 


The parameterless constructor constraint requires T to have a public parameterless 
constructor. If this constraint is defined, you can call new() on T: 


static void Initialize<T> (T[] array) where T : new() 


{ 
for (int i = 0; i < array.Length; i++) 
array[i] = new T(); 


The naked type constraint requires one type parameter to derive from (or match) 
another type parameter. In this example, the method FilteredStack returns 
another Stack, containing only the subset of elements where the type parameter U is 
of the type parameter T: 


class Stack<T> 


{ 
Stack<U> FilteredStack<U>() where U: T {...} 


} 
Subclassing Generic Types 


A generic class can be subclassed just like a nongeneric class. The subclass can leave 
the base class's type parameters open, as in the following example: 


class Stack<T> qi ctevelk 
class SpecialStack<T> : Stack<T> {...} 


Or, the subclass can close the generic type parameters with a concrete type: 
class IntStack : Stack<int> {...} 
A subtype can also introduce fresh type arguments: 


class List<T> fedscadh 
class KeyedList<T,TKey> : List<T> {...} 
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Technically, all type arguments on a subtype are fresh: you 
could say that a subtype closes and then reopens the base type 
arguments. This means that a subclass can give new (and 
potentially more meaningful) names to the type arguments 
that it reopens: 


class List<T> {...} 
class KeyedList<TElement,TKey> : List<TElement> {...} 


Self-Referencing Generic Declarations 
A type can name itself as the concrete type when closing a type argument: 


public interface IEquatable<T> { bool Equals (T obj); } 


public class Balloon : IEquatable<Balloon> 
{ 
public string Color { get; set; } 
public int CC { get; set; } 


public bool Equals (Balloon b) 
{ 
if (b == null) return false; 
return b.Color == Color && b.CC == CC; 


} 
} 
The following are also legal: 
class Foo<T> where T : IComparable<T> { ... } 
class Bar<T> where T : Bar<T> { ... } 
Static Data 


Static data is unique for each closed type: 


class Bob<T> { public static int Count; } 


class Test 
{ 
static void Main() 
{ 
Console.WriteLine (++Bob<int>.Count); // 1 
Console.WriteLine (++Bob<int>.Count); // 2 


Console.WriteLine (++Bob<string>.Count); // 1 
Console.WriteLine (++Bob<object>.Count); // 1 


} 
} 


Type Parameters and Conversions 


C#’s cast operator can perform several kinds of conversion, including the following: 


e Numeric conversion 


e Reference conversion 
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¢ Boxing/unboxing conversion 


e Custom conversion (via operator overloading; see Chapter 4) 


The decision as to which kind of conversion will take place happens at compile time, 
based on the known types of the operands. This creates an interesting scenario with 
generic type parameters, because the precise operand types are unknown at compile 
time. If this leads to ambiguity, the compiler generates an error. 


The most common scenario is when you want to perform a reference conversion: 


StringBuilder Foo<T> (T arg) 


{ 
if (arg is StringBuilder) 
return (StringBuilder) arg; // Will not compile 


Buljea1D 


- 


Without knowledge of T’s actual type, the compiler is concerned that you might 
have intended this to be a custom conversion. The simplest solution is to instead use 
the as operator, which is unambiguous because it cannot perform custom 
conversions: 
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StringBuilder Foo<T> (T arg) 
{ 


StringBuilder sb = arg as StringBuilder; 
if (sb != null) return sb; 


= 


A more general solution is to first cast to object. This works because conversions 
to/from object are assumed not to be custom conversions, but reference or boxing/ 
unboxing conversions. In this case, StringBuilder is a reference type, so it must be 
a reference conversion: 


return (StringBuilder) (object) arg; 


Unboxing conversions can also introduce ambiguities. The following could be an 
unboxing, numeric, or custom conversion: 


int Foo<T> (T x) => (int) x; // Compile-time error 


The solution, again, is to first cast to object and then to int (which then unambig- 
uously signals an unboxing conversion in this case): 


int Foo<T> (T x) => (int) (object) x; 


Covariance 


Assuming A is convertible to B, X has a covariant type parameter if X<A> is converti- 
ble to X<B>. 
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With C#’s notion of covariance (and contravariance), “conver- 
tible” means convertible via an implicit reference conversion— 
such as A subclassing B, or A implementing B. Numeric conver- 
sions, boxing conversions, and custom conversions are not 
included. 


For instance, type IFoo<T> has a covariant T if the following is legal: 


IFoo<string> s wad 
IFoo<object> b = s; 


Interfaces permit covariant type parameters (as do delegates; see Chapter 4), but 
classes do not. Arrays also allow covariance (A[] can be converted to B[] if A has an 
implicit reference conversion to B), and are discussed here for comparison. 


Covariance and contravariance (or simply “variance”) are 
advanced concepts. The motivation behind introducing and 
enhancing variance in C# was to allow generic interface and 
generic types (in particular, those defined in .NET Core, such 
as IEnumerable<T>) to work more as youd expect. You can 
benefit from this without understanding the details behind 
covariance and contravariance. 


Variance is not automatic 


To ensure static type safety, type parameters are not automatically variant. Consider 
the following: 


class Animal {} 
class Bear : Animal {} 
class Camel : Animal {} 


public class Stack<T> // A simple Stack implementation 


{ 
int position; 
T[] data = new T[100]; 
public void Push (T obj) => data[position++] = obj; 
public T Pop() => data[--position]; 
} 


The following fails to compile: 


Stack<Bear> bears = new Stack<Bear>(); 
Stack<Animal> animals = bears; // Compile-time error 


That restriction prevents the possibility of runtime failure with the following code: 
animals.Push (new Camel()); // Trying to add Camel to bears 


Lack of covariance, however, can hinder reusability. Suppose, for instance, that we 
wanted to write a method to Wash a stack of Animals: 


public class ZooCleaner 


{ 
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public static void Wash (Stack<Animal> animals) {...} 
} 
Calling Wash with a stack of Bears would generate a compile-time error. One work- 
around is to redefine the Wash method with a constraint: 


class ZooCleaner 


{ 


public static void Wash<T> (Stack<T> animals) where T : Animal { ... } 


} 


We can now call Wash as follows: 


Stack<Bear> bears = new Stack<Bear>(); 
ZooCleaner.Wash (bears); 


Another solution is to have Stack<T> implement an interface with a covariant type 
parameter, as you'll see shortly. 


Buljea1D 
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Arrays 


For historical reasons, array types support covariance. This means that B[] can be 
cast to A[ ] if B subclasses A (and both are reference types): 


Bear[] bears = new Bear[3]; 
Animal[] animals = bears; // OK 


The downside of this reusability is that element assignments can fail at runtime: 


animals[0] = new Camel(); // Runtime error 


Declaring a covariant type parameter 


Type parameters on interfaces and delegates can be declared covariant by marking 
them with the out modifier. This modifier ensures that, unlike with arrays, cova- 
riant type parameters are fully type-safe. 


We can illustrate this with our Stack<T> class by having it implement the following 
interface: 


public interface IPoppable<out T> { T Pop(); } 


The out modifier on T indicates that T is used only in output positions (e.g., return 
types for methods). The out modifier flags the type parameter as covariant and 
allows us to do this: 


var bears = new Stack<Bear>(); 

bears.Push (new Bear()); 

// Bears implements IPoppable<Bear>. We can convert to IPoppable<Animal>: 
IPoppable<Animal> animals = bears; // Legal 

Animal a = animals.Pop(); 


Covariance (and contravariance) in interfaces is something 
that you typically consume: it’s less common that you need to 
write variant interfaces. 
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The conversion from bears to animals is permitted by the compiler—by virtue of 
the type parameter being covariant. This is type-safe because the case the compiler 
is trying to avoid—pushing a Camel onto the stack—can’t occur, because there's no 
way to feed a Camel into an interface where T can appear only in output positions. 


Curiously, method parameters marked as out are not eligible 
for covariance, due to a limitation in the CLR. 


We can take advantage of the ability to cast covariantly to solve the reusability prob- 
lem described earlier: 


public class ZooCleaner 


{ 
public static void Wash (IPoppable<Animal> animals) { ... } 


} 


The IEnumerator<T> and IEnumerable<T> interfaces 
described in Chapter 7 have a covariant T. This allows you to 
cast IEnumerable<string> to IEnumerable<object>, for 
instance. 


The compiler will generate an error if you use a covariant type parameter in an 
input position (e.g., a parameter to a method or a writable property). 


Covariance (and contravariance) works only for elements with 
reference conversions—not boxing conversions. (This applies 
both to type parameter variance and array variance.) So, if you 
wrote a method that accepted a parameter of type IPoppable 
<object>, you could call it with IPoppable<string>, but not 
IPoppable<int>. 


Contravariance 


We previously saw that, assuming that A allows an implicit reference conversion to 
B, a type X has a covariant type parameter if X<A> allows a reference conversion to 
X<B>. Contravariance is when you can convert in the reverse direction—from X<B> 
to X<A>. This is supported if the type parameter appears only in input positions and 
is designated with the in modifier. Extending our previous example, if the Stack<T> 
class implements the following interface: 


public interface IPushable<in T> { void Push (T obj); } 
we can legally do this: 


IPushable<Animal> animals = new Stack<Animal>(); 
IPushable<Bear> bears = animals; // Legal 
bears.Push (new Bear()); 


No member in IPushable outputs a T, so we cant get into trouble by casting 
animals to bears (there’s no way to Pop, for instance, through that interface). 
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Our Stack<T> class can implement both IPushable<T> and 
IPoppable<T>—despite T having opposing variance annota- 
tions in the two interfaces! This works because you must exer- 
cise variance through the interface and not the class; therefore, 
you must commit to the lens of either IPoppable or 
IPushable before performing a variant conversion. This lens 
then restricts you to the operations that are legal under the 
appropriate variance rules. 


This also illustrates why classes do not allow variant type 


parameters: concrete implementations typically require data 
to flow in both directions. 


To give another example, consider the following interface, defined as part of .NET 
Core: 


public interface IComparer<in T> 


// Returns a value indicating the relative ordering of a and b 
int Compare (T a, T b); 
} 


Because the interface has a contravariant T, we can use an IComparer<object> to 
compare two strings: 


var objectComparer = Comparer<object>.Default; 

// objectComparer implements IComparer<object> 
IComparer<string> stringComparer = objectComparer; 

int result = stringComparer.Compare ("Brett", "Jemaine"); 


Mirroring covariance, the compiler will report an error if you try to use a contravar- 
iant type parameter in an output position (e.g., as a return value or in a readable 


property). 


C# Generics Versus C++ Templates 


C# generics are similar in application to C++ templates, but they work very differ- 
ently. In both cases, a synthesis between the producer and consumer must take place 
in which the placeholder types of the producer are filled in by the consumer. How- 
ever, with C# generics, producer types (i.e., open types such as List<T>) can be 
compiled into a library (such as mscorlib.dll). This works because the synthesis 
between the producer and the consumer that produces closed types doesn't actually 
happen until runtime. With C++ templates, this synthesis is performed at compile 
time. This means that in C++ you don’t deploy template libraries as .dlls—they exist 
only as source code. It also makes it difficult to dynamically inspect, let alone create, 
parameterized types on the fly. 


To dig deeper into why this is the case, consider again the Max method in C#: 


static T Max <T> (T a, T b) where T : IComparable<T> 
=> a.CompareTo (b) > 0? a: b; 
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Why couldn't we have implemented it like this? 


static T Max <T> (T a, T b) 
=> (a>b?a: b); // Compile error 
The reason is that Max needs to be compiled once and work for all possible values of 
T. Compilation cannot succeed, because there is no single meaning for > across all 
values of T—in fact, not every T even has a > operator. In contrast, the following 
code shows the same Max method written with C++ templates. This code will be 
compiled separately for each value of T, taking on whatever semantics > has for a 
particular T, failing to compile if a particular T does not support the > operator: 


template <class T> T Max (T a, T b) 
{ 


return a>b?a: b; 


} 
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Advanced C# 


In this chapter, we cover advanced C# topics that build on concepts explored in 
Chapters 2 and 3. You should read the first four sections sequentially; you can read 
the remaining sections in any order. 


Delegates 


A delegate is an object that knows how to call a method. 


A delegate type defines the kind of method that delegate instances can call. Specifi- 
cally, it defines the method’s return type and its parameter types. The following 
defines a delegate type called Transformer: 


delegate int Transformer (int x); 


Transformer is compatible with any method with an int return type and a single 
int parameter, such as this: 


static int Square (int x) { return x * x; } 
Or, more tersely: 
static int Square (int x) => x * x; 
Assigning a method to a delegate variable creates a delegate instance: 
Transformer t = Square; 
You can invoke a delegate instance in the same way as a method: 
int answer = t(3); // answer is 9 
Here's a complete example: 


delegate int Transformer (int x); 


class Test 


{ 


static void Main() 
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Transformer t = Square; // Create delegate instance 
int result = t(3); // Invoke delegate 
Console.WriteLine (result); // 9 

} 

static int Square (int x) => x * x; 


} 


A delegate instance literally acts as a delegate for the caller: the caller invokes the 
delegate and then the delegate calls the target method. This indirection decouples 
the caller from the target method. 


The statement: 
Transformer t = Square; 
is shorthand for: 
Transformer t = new Transformer (Square); 


Technically, we are specifying a method group when we refer to 
Square without brackets or arguments. If the method is over- 
loaded, C# will pick the correct overload based on the signa- 
ture of the delegate to which it’s being assigned. 


The expression: 
t(3) 

is shorthand for: 
t. Invoke(3) 


A delegate is similar to a callback, a general term that captures 
constructs such as C function pointers. 


Writing Plug-in Methods with Delegates 


A delegate variable is assigned a method at runtime. This is useful for writing plug- 
in methods. In this example, we have a utility method named Transform that 
applies a transform to each element in an integer array. The Transform method has 
a delegate parameter, which you can use for specifying a plug-in transform: 


public delegate int Transformer (int x); 


class Util 
{ 


public static void Transform (int[] values, Transformer t) 


{ 
for (int i = 0; i < values.Length; i++) 
values[i] = t (values[i]); 
} 
} 


class Test 
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static void Main() 


{ 
int[] values = { 1, 2, 3 }; 
Util.Transform (values, Square); // Hook in the Square method 
foreach (int i in values) 
Console.Write (i +" "); //1 4 9 
} 


static int Square (int x) => x * x; 


} 


Our Transform method is a higher-order function because it’s a function that takes a 
function as an argument. (A method that returns a delegate would also be a higher- 
order function.) 


Multicast Delegates 


All delegate instances have multicast capability. This means that a delegate instance 
can reference not just a single target method, but also a list of target methods. The + 
and += operators combine delegate instances: 


SomeDelegate d = SomeMethod1; 
d += SomeMethod2; 


The last line is functionally the same as the following: 
d = d + SomeMethod2; 


Invoking d will now call both SomeMethod1 and SomeMethod2. Delegates are invoked 
in the order in which they are added. 


The - and -= operators remove the right delegate operand from the left delegate 
operand: 


d -= SomeMethod1; 
Invoking d will now cause only SomeMethod2 to be invoked. 


Calling + or += on a delegate variable with a null value works, and it is equivalent to 
assigning the variable to a new value: 

SomeDelegate d = null; 

d += SomeMethod1; // Equivalent (when d is null) to d = SomeMethod1; 
Similarly, calling -= on a delegate variable with a single matching target is equivalent 
to assigning null to that variable. 


Delegates are immutable, so when you call += or -=, youre in 
fact creating a new delegate instance and assigning it to the 
existing variable. 


If a multicast delegate has a nonvoid return type, the caller receives the return value 
from the last method to be invoked. The preceding methods are still called, but their 
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return values are discarded. For most scenarios in which multicast delegates are 
used, they have void return types, so this subtlety does not arise. 


All delegate types implicitly derive from System.Multicast 
Delegate, which inherits from System. Delegate. C# compiles 
+, -, +=, and -= operations made on a delegate to the static 
Combine and Remove methods of the System.Delegate class. 


Multicast delegate example 


Suppose that you wrote a method that took a long time to execute. That method 
could regularly report progress to its caller by invoking a delegate. In this example, 
the HardWork method has a ProgressReporter delegate parameter, which it invokes 
to indicate progress: 


public delegate void ProgressReporter (int percentComplete) ; 


public class Util 


{ 
public static void HardWork (ProgressReporter p) 
{ 
for (int i = 0; i < 10; i++) 
{ 
p (i * 10); // Invoke delegate 
System. Threading. Thread.Sleep (100); // Simulate hard work 
} 
} 
} 


To monitor progress, the Main method creates a multicast delegate instance p, such 
that progress is monitored by two independent methods: 


class Test 


{ 


static void Main() 


{ 


ProgressReporter p = WriteProgressToConsole; 
p += WriteProgressToFile; 
Util.HardWork (p); 

} 


static void WriteProgressToConsole (int percentComplete) 
=> Console.WriteLine (percentComplete) ; 


static void WriteProgressToFile (int percentComplete) 
=> System.1I0.File.WriteAllText ("progress.txt", 
percentCompLete.ToString()); 


} 


Instance Versus Static Method Targets 


When an instance method is assigned to a delegate object, the latter must maintain a 
reference not only to the method, but also to the instance to which the method 
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belongs. The System.Delegate class’s Target property represents this instance (and 
will be null for a delegate referencing a static method). Here’s an example: 


public delegate void ProgressReporter (int percentComplete) ; 


class Test 
{ 
static void Main() 
{ 
X x = new X(); 
ProgressReporter p = x.InstanceProgress; 


p(99); // 99 
Console.WriteLine (p.Target == x); // True 
Console.WriteLine (p.Method) ; // Void InstanceProgress(Int32) 
} 
} 
class X 
{ 


public void InstanceProgress (int percentComplete) 
=> Console.WriteLine (percentComplete) ; 


} 
Generic Delegate Types 


A delegate type can contain generic type parameters: 
public delegate T Transformer<T> (T arg); 


With this definition, we can write a generalized Transform utility method that 
works on any type: 


public class Util 
{ 


public static void Transform<T> (T[] values, Transformer<T> t) 
{ 
for (int i = 0; i < values.Length; i++) 
values[i] = t (values[i]); 
} 
} 


class Test 
{ 
static void Main() 
{ 
int[] values = { 1, 2, 3 }; 
Util.Transform (values, Square); // Hook in Square 
foreach (int i in values) 
Console.Write (i +" "); //1 4 9 


} 


static int Square (int x) => x * x; 


} 
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The Func and Action Delegates 


With generic delegates, it becomes possible to write a small set of delegate types that 
are so general they can work for methods of any return type and any (reasonable) 
number of arguments. These delegates are the Func and Action delegates, defined in 
the System namespace (the in and out annotations indicate variance, which we 
cover in the context of delegates shortly): 


delegate TResult Func <out TResult> Os 

delegate TResult Func <in T, out TResult> (T arg); 

delegate TResult Func <in T1, in T2, out TResult> (T1 arg1, T2 arg2); 
. and so on, up to T16 


delegate void Action 03 

delegate void Action <in T> (T arg); 

delegate void Action <in T1, in T2> (71 argi, T2 arg2); 
. and so on, up to T16 


These delegates are extremely general. The Transformer delegate in our previous 
example can be replaced with a Func delegate that takes a single argument of type T 
and returns a same-typed value: 


public static void Transform<T> (T[] values, Func<T,T> transformer) 


{ 
for (int i = 0; i < values.Length; i++) 
values[i] = transformer (values[i]); 


} 


The only practical scenarios not covered by these delegates are ref/out and pointer 
parameters. 


Prior to Framework 2.0, the Func and Action delegates did 
not exist (because generics did not exist). It’s for this historical 
reason that much of the Framework uses custom delegate 
types rather than Func and Action. 


Delegates Versus Interfaces 


A problem that you can solve with a delegate can also be solved with an interface. 
For instance, we can rewrite our original example with an interface called 
ITransformer instead of a delegate: 


public interface ITransformer 


{ 


int Transform (int x); 


} 


public class Util 
{ 


public static void TransformAll (int[] values, ITransformer t) 


{ 
for (int i = 0; i < values.Length; i++) 
values[i] = t.Transform (values[i]); 
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} 


class Squarer : ITransformer 


{ 


public int Transform (int x) => x * x; 


} 


static void Main() 
{ 
int[] values = { 1, 2, 3 }; 
Util.TransformALlL (values, new Squarer()); 
foreach (int i in values) 
Console.WriteLine (i); 
} 
A delegate design might be a better choice than an interface design if one or more of 


these conditions are true: 


¢ The interface defines only a single method. 
¢ Multicast capability is needed. 


e The subscriber needs to implement the interface multiple times. 
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In the ITransformer example, we don’t need to multicast. However, the interface 
defines only a single method. Furthermore, our subscriber might need to implement 
ITransformer multiple times, to support different transforms, such as square or 
cube. With interfaces, we're forced into writing a separate type per transform 
because Test can implement ITransformer only once. This is quite cumbersome: 


class Squarer : ITransformer 


{ 
public int Transform (int x) => x * x; 
} 
class Cuber : ITransformer 
{ 
public int Transform (int x) => x * x * x; 
+ 


static void Main() 
{ 
int[] values = { 1, 2, 3 }; 
Util.TransformALL (values, new Cuber()); 
foreach (int i in values) 
Console.WriteLine (i); 
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Delegate Compatibility 


Type compatibility 


Delegate types are all incompatible with one another, even if their signatures are the 
same: 


delegate void D1(); 
delegate void D2(); 


D1 di 
D2 d2 


Method1; 
d1; // Compile-time error 


The following, however, is permitted: 


D2 d2 = new D2 (d1); 


Delegate instances are considered equal if they have the same method targets: 


delegate void D(); 


D di = Method1; 
D d2 = Method1; 
Console.WriteLine (d1 == d2); // True 


Multicast delegates are considered equal if they reference the same methods in the 
same order. 


Parameter compatibility 


When you call a method, you can supply arguments that have more specific types 
than the parameters of that method. This is ordinary polymorphic behavior. For the 
same reason, a delegate can have more specific parameter types than its method tar- 
get. This is called contravariance. Here's an example: 


delegate void StringAction (string s); 


class Test 


i 


static void Main() 


{ 
StringAction sa = new StringAction (ActOnObject); 


sa ("hello"); 
} 


static void ActOnObject (object 0) => Console.WriteLine (0); // hello 
} 


(As with type parameter variance, delegates are variant only for reference 
conversions.) 
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A delegate merely calls a method on someone else’s behalf. In this case, the String 
Action is invoked with an argument of type string. When the argument is then 
relayed to the target method, the argument is implicitly upcast to an object. 


The standard event pattern is designed to help you utilize con- 
travariance through its use of the common EventArgs base 
class. For example, you can have a single method invoked by 
two different delegates, one passing a MouseEventArgs and the 
other passing a KeyEventArgs. 


Return type compatibility 


If you call a method, you might get back a type that is more specific than what you 
asked for. This is ordinary polymorphic behavior. For the same reason, a delegate’s 
target method might return a more specific type than described by the delegate. 
This is called covariance: 


delegate object ObjectRetriever(); 


> 

class Test 2 
{ (om) 
HS 

static void Main() fa) 

{ Qa 





ObjectRetriever o = new ObjectRetriever (RetrieveString); 
object result = o(); 
Console.WriteLine (result); // hello 


} 


static string RetrieveString() => "hello"; 


} 


ObjectRetriever expects to get back an object, but an object subclass will also do: 
delegate return types are covariant. 


Generic delegate type parameter variance 


In Chapter 3, we saw how generic interfaces support covariant and contravariant 
type parameters. The same capability exists for delegates, too. 


If you're defining a generic delegate type, it’s good practice to do the following: 


e Marka type parameter used only on the return value as covariant (out). 

e Mark any type parameters used only on parameters as contravariant (in). 
Doing so allows conversions to work naturally by respecting inheritance relation- 
ships between types. 

The following delegate (defined in the System namespace) has a covariant TResult: 
delegate TResult Func<out TResult>(); 


allowing: 





Delegates | 157 


Func<string> x = ...3 
Func<object> y = x; 


The following delegate (defined in the System namespace) has a contravariant T: 
delegate void Action<in T> (T arg); 
allowing: 


Action<object> x = ...; 
Action<string> y = x; 


Events 


When using delegates, two emergent roles commonly appear: broadcaster and 
subscriber. 


The broadcaster is a type that contains a delegate field. The broadcaster decides 
when to broadcast, by invoking the delegate. 


The subscribers are the method target recipients. A subscriber decides when to start 
and stop listening by calling += and -= on the broadcaster’s delegate. A subscriber 
does not know about, or interfere with, other subscribers. 


Events are a language feature that formalizes this pattern. An event is a construct 
that exposes just the subset of delegate features required for the broadcaster/ 
subscriber model. The main purpose of events is to prevent subscribers from interfer- 
ing with one another. 


The easiest way to declare an event is to put the event keyword in front of a delegate 
member: 


// Delegate definition 

public delegate void PriceChangedHandler (decimal oldPrice, 
decimal newPrice); 

public class Broadcaster 


{ 
// Event declaration 
public event PriceChangedHandler PriceChanged; 


} 


Code within the Broadcaster type has full access to PriceChanged and can treat it 
as a delegate. Code outside of Broadcaster can perform only += and -= operations 
on the PriceChanged event. 


Consider the following example. The Stock class fires its PriceChanged event every 
time the Price of the Stock changes: 


public delegate void PriceChangedHandler (decimal oldPrice, 
decimal newPrice); 
public class Stock 
{ 
string symbol; 
decimal price; 
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public Stock (string symbol) => this.symbol = symbol; 
public event PriceChangedHandler PriceChanged; 


public decimal Price 








{ 
get => price; 
set 
{ 
if (price == value) return; // Exit if nothing has changed 
decimal oldPrice = price; 
price = value; 
if (PriceChanged != null) // If invocation list not 
PriceChanged (oldPrice, price); // empty, fire event. 
} 
} 
} 
How Do Events Work on the Inside? _ 
(¥ 
Three things happen under the hood when you declare an event as follows: a5 
HS 
public class Broadcaster 8 
{ 
public event PriceChangedHandler PriceChanged; 
} 


First, the compiler translates the event declaration into something close to the 
following: 


PriceChangedHandler priceChanged; // private delegate 
public event PriceChangedHandler PriceChanged 


a8 
add { priceChanged += value; } 
remove { priceChanged -= value; } 
} 
The add and remove keywords denote explicit event accessors—which act rather like 
property accessors. We describe how to write these later. 


Second, the compiler looks within the Broadcaster class for references to 
PriceChanged that perform operations other than += or -=, and redirects them to the 
underlying priceChanged delegate field. 


Third, the compiler translates += and -= operations on the event to calls to the 
event's add and remove accessors. Interestingly, this makes the behavior of += and -= 
unique when applied to events: unlike in other scenarios, it’s not simply a shortcut 
for + and - followed by an assignment. 











If we remove the event keyword from our example so that PriceChanged becomes 
an ordinary delegate field, our example would give the same results. However, 
Stock would be less robust insomuch as subscribers could do the following things 
to interfere with one another: 
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¢ Replace other subscribers by reassigning PriceChanged (instead of using the += 
operator). 


e Clear all subscribers (by setting PriceChanged to null). 


e Broadcast to other subscribers by invoking the delegate. 


Events in Windows Runtime (WinRT) libraries have slightly 
different semantics in that attaching to an event returns a 
token, which is required to detach from the event. The com- 
piler transparently bridges this gap (by maintaining an inter- 
nal dictionary of tokens) so that you can consume WinRT 
events as though they were ordinary CLR events. 


Standard Event Pattern 


In almost all cases for which events are defined in the .NET Core library, their defi- 
nition adheres to a standard pattern designed to provide consistency across library 
and user code. At the core of the standard event pattern is System. EventArgs: a pre- 
defined Framework class with no members (other than the static Empty property). 
EventArgs is a base class for conveying information for an event. In our Stock 
example, we would subclass EventArgs to convey the old and new prices when a 
PriceChanged event is fired: 


public class PriceChangedEventArgs : System.EventArgs 
{ 


public readonly decimal LastPrice; 
public readonly decimal NewPrice; 


public PriceChangedEventArgs (decimal lastPrice, decimal newPrice) 


{ 


LastPrice = lastPrice; 
NewPrice = newPrice; 


J 
} 


For reusability, the EventArgs subclass is named according to the information it 
contains (rather than the event for which it will be used). It typically exposes data as 
properties or as read-only fields. 


With an EventArgs subclass in place, the next step is to choose or define a delegate 
for the event. There are three rules: 
e It must have a void return type. 


¢ It must accept two arguments: the first of type object, and the second a sub- 
class of EventArgs. The first argument indicates the event broadcaster, and the 
second argument contains the extra information to convey. 


e Its name must end with EventHandler. 





160 | Chapter 4: Advanced C# 


The Framework defines a generic delegate called System. EventHandler<> that sat- 
isfies these rules: 


public delegate void EventHandler<TEventArgs> 
(object source, TEventArgs e) where TEventArgs : EventArgs; 


Before generics existed in the language (prior to C# 2.0), we 
would have had to instead write a custom delegate as follows: 
public delegate void PriceChangedHandler 
(object sender, PriceChangedEventArgs e); 


For historical reasons, most events within the Framework use 
delegates defined in this way. 


The next step is to define an event of the chosen delegate type. Here, we use the 
generic EventHandler delegate: 


public class Stock 
{ 


public event EventHandler<PriceChangedEventArgs> PriceChanged; 
I 
Finally, the pattern requires that you write a protected virtual method that fires the 
event. The name must match the name of the event, prefixed with the word On, and 
then accept a single EventArgs argument: 


public class Stock 
{ 


public event EventHandler<PriceChangedEventArgs> PriceChanged; 


protected virtual void OnPriceChanged (PriceChangedEventArgs e) 


{ 
if (PriceChanged != null) PriceChanged (this, e); 


} 
} 


To work robustly in multithreaded scenarios (Chapter 14), 
you need to assign the delegate to a temporary variable before 
testing and invoking it: 

var temp = PriceChanged; 

if (temp != null) temp (this, e); 
We can achieve the same functionality without the temp vari- 
able with the null-conditional operator: 

PriceChanged?. Invoke (this, e); 


Being both thread-safe and succinct, this is the best general 
way to invoke events. 


This provides a central point from which subclasses can invoke or override the 
event (assuming the class is not sealed). 
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Here's the complete example: 
using System; 


public class PriceChangedEventArgs : EventArgs 


{ 
public readonly decimal LastPrice; 
public readonly decimal NewPrice; 


public PriceChangedEventArgs (decimal lastPrice, decimal newPrice) 


{ 


LastPrice = lastPrice; NewPrice = newPrice; 
} 
} 


public class Stock 
{ 


string symbol; 
decimal price; 
public Stock (string symbol) => this.symbol = symbol; 


public event EventHandler<PriceChangedEventArgs> PriceChanged; 


protected virtual void OnPriceChanged (PriceChangedEventArgs e) 


{ 
PriceChanged?.Invoke (this, e); 
} 
public decimal Price 
{ 
get => price; 
set 
{ 
if (price == value) return; 
decimal oldPrice = price; 
price = value; 
OnPriceChanged (new PriceChangedEventArgs (oldPrice, price)); 
J 
} 
} 
class Test 
{ 
static void Main() 
{ 
Stock stock = new Stock ("THPW"); 
stock.Price = 27.10M; 
// Register with the PriceChanged event 
stock.PriceChanged += stock_PriceChanged; 
stock.Price = 31.59M; 
} 


static void stock_PriceChanged (object sender, PriceChangedEventArgs e) 


{ 
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if ((e.NewPrice - e.LastPrice) / e.LastPrice > 0.1M) 
Console.WriteLine ("Alert, 10% stock price increase!"); 


i 
I 
The predefined nongeneric EventHandler delegate can be used when an event 
doesn’t carry extra information. In this example, we rewrite Stock such that the 
PriceChanged event is fired after the price changes, and no information about the 
event is necessary, other than it happened. We also make use of the Event 
Args.Empty property in order to avoid unnecessarily instantiating an instance of 
EventArgs: 


public class Stock 
{ 


string symbol; 
decimal price; 


public Stock (string symbol) { this.symbol = symbol; } 





public event EventHandler PriceChanged; 4 
< 
protected virtual void OnPriceChanged (EventArgs e) 2 3 
{ ) 
PriceChanged?.Invoke (this, e); = 
} 
public decimal Price 
{ 
get { return price; } 
set 
{ 
if (price == value) return; 
price = value; 
OnPriceChanged (EventArgs.Empty) ; 
} 
} 
} 
Event Accessors 


An event's accessors are the implementations of its += and -= functions. By default, 
accessors are implemented implicitly by the compiler. Consider this event 
declaration: 


public event EventHandler PriceChanged; 


The compiler converts this to the following: 


e A private delegate field 


e A public pair of event accessor functions (add_PriceChanged and remove_ 
PriceChanged) whose implementations forward the += and -= operations to 
the private delegate field 
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You can take over this process by defining explicit event accessors. Here’s a manual 
implementation of the PriceChanged event from our previous example: 


private EventHandler priceChanged; // Declare a private delegate 


public event EventHandler PriceChanged 


{ 
add { priceChanged += value; } 
remove { priceChanged -= value; } 


} 


This example is functionally identical to C#’s default accessor implementation 
(except that C# also ensures thread safety around updating the delegate via a lock- 
free compare-and-swap algorithm; see http://albahari.com/threading). By defining 
event accessors ourselves, we instruct C# not to generate default field and accessor 
logic. 


With explicit event accessors, you can apply more complex strategies to the storage 
and access of the underlying delegate. There are three scenarios for which this is 
useful: 


e When the event accessors are merely relays for another class that is broadcast- 
ing the event. 


e When the class exposes many events, for which most of the time very few sub- 
scribers exist, such as a Windows control. In such cases, it is better to store the 
subscriber’s delegate instances in a dictionary because a dictionary will contain 
less storage overhead than dozens of null delegate field references. 


¢ When explicitly implementing an interface that declares an event. 


Here is an example that illustrates the last point: 


public interface IFoo { event EventHandler Ev; } 


class Foo : IFoo 


{ 


private EventHandler ev; 


event EventHandler IFoo.Ev 


{ 
add { ev += value; } 
remove { ev -= value; } 
} 
} 


The add and remove parts of an event are compiled to add_XXxXx 
and remove_XXX methods. 
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Event Modifiers 


Like methods, events can be virtual, overridden, abstract, or sealed. Events can also 
be static: 


public class Foo 


{ 


public static event EventHandler<EventArgs> StaticEvent; 
public virtual event EventHandler<EventArgs> VirtualEvent; 


} 


Lambda Expressions 


A lambda expression is an unnamed method written in place of a delegate instance. 
The compiler immediately converts the lambda expression to either of the 
following: 


e A delegate instance. 


e An expression tree, of type Expression<TDelegate>, representing the code 
inside the lambda expression in a traversable object model. This allows the 
lambda expression to be interpreted later at runtime (see “Building Query 
Expressions” on page 416 in Chapter 8). 
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Given the following delegate type: 
delegate int Transformer (int i); 


we could assign and invoke the lambda expression x => x * x as follows: 


Transformer sqr = x => x * x; 
Console.WriteLine (sqr(3)); // 9 


Internally, the compiler resolves lambda expressions of this 
type by writing a private method and then moving the expres- 
sion’s code into that method. 


A lambda expression has the following form: 


(parameters) => expression-or-statement-block 
For convenience, you can omit the parentheses if and only if there is exactly one 
parameter of an inferable type. 
In our example, there is a single parameter, x, and the expression is x * x: 

xX => X * xX; 


Each parameter of the lambda expression corresponds to a delegate parameter, and 
the type of the expression (which may be void) corresponds to the return type of 
the delegate. 
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In our example, x corresponds to parameter i, and the expression x * x corre- 
sponds to the return type int, therefore being compatible with the Transformer 
delegate: 


delegate int Transformer (int i); 


A lambda expression’s code can be a statement block instead of an expression. We 
can rewrite our example as follows: 


x => { return x * x; }3 


Lambda expressions are used most commonly with the Func and Action delegates, 
so you will most often see our earlier expression written as follows: 


Func<int,int> sqr = x => x * x; 
Here's an example of an expression that accepts two parameters: 


Func<string,string,int> totalLength = (s1, s2) => s1.Length + s2.Length; 
int total = totalLength ("hello", "world"); // total is 10; 


Explicitly Specifying Lambda Parameter Types 


The compiler can usually infer the type of lambda parameters contextually. When 
this is not the case, you must specify the type of each parameter explicitly. Consider 
the following two methods: 


void Foo<T> (T x) {} 
void Bar<T> (Action<T> a) {} 


The following code will fail to compile, because the compiler cannot infer the type 
of x: 


Bar (x => Foo (x)); // What type is x? 
We can fix this by explicitly specifying x’s type as follows: 
Bar ((int x) => Foo (x)); 
This particular example is simple enough that it can be fixed in two other ways: 


Bar<int> (x => Foo (x)); // Specify type parameter for Bar 
Bar<int> (Foo); // As above, but with method group 


Capturing Outer Variables 


A lambda expression can reference the local variables and parameters of the method 
in which it’s defined (outer variables): 


static void Main() 
{ 
int factor = 2; 
Func<int, int> multiplier = n => n * factor; 
Console.WriteLine (multiplier (3)); // 6 
} 
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Outer variables referenced by a lambda expression are called captured variables. A 
lambda expression that captures variables is called a closure. 


Variables can also be captured by anonymous methods and 
local methods. The rules for captured variables, in these cases, 
are the same. 


Captured variables are evaluated when the delegate is actually invoked, not when the 
variables were captured: 

int factor = 2; 

Func<int, int> multiplier = n => n * factor; 


factor = 10; 
Console.WriteLine (multiplier (3)); // 30 


Lambda expressions can themselves update captured variables: 


int seed = 0; 
Func<int> natural = () => seed++; 


Console.WriteLine (natural()); // 0 
Console.WriteLine (natural()); // 1 
Console.WriteLine (seed); // 2 


Captured variables have their lifetimes extended to that of the delegate. In the fol- 
lowing example, the local variable seed would ordinarily disappear from scope 
when Natural finished executing. But because seed has been captured, its lifetime is 
extended to that of the capturing delegate, natural: 


static Func<int> Natural() 


{ 
int seed = 0; 
return () => seed++; // Returns a closure 
} 
static void Main() 
{ 
Func<int> natural = Natural(); 
Console.WriteLine (natural()); // 9 
Console.WriteLine (natural()); // 1 
} 


A local variable instantiated within a lambda expression is unique per invocation of 
the delegate instance. If we refactor our previous example to instantiate seed within 
the lambda expression, we get a different (in this case, undesirable) result: 


static Func<int> Natural() 


: return() => { int seed = 0; return seed++; }; 
} 
static void Main() 
{ 
Func<int> natural = Natural(); 
Console.WriteLine (natural()); // 9 
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Console.WriteLine (natural()); // 0 
} 


Capturing is internally implemented by “hoisting” the cap- 
tured variables into fields of a private class. When the method 
is called, the class is instantiated and lifetime-bound to the 
delegate instance. 


Capturing iteration variables 


When you capture the iteration variable of a for loop, C# treats that variable as 
though it were declared outside the loop. This means that the same variable is cap- 
tured in each iteration. The following program writes 333 instead of writing 012: 


Action[] actions = new Action[3]; 


for (int i = 0; i < 3; i++) 
actions [i] = () => Console.Write (i); 


foreach (Action a in actions) a(); // 333 


Each closure (shown in boldface) captures the same variable, i. (This actually makes 
sense when you consider that i is a variable whose value persists between loop itera- 
tions; you can even explicitly change i within the loop body if you want.) The con- 
sequence is that when the delegates are later invoked, each delegate sees i’s value at 
the time of invocation—which is 3. We can illustrate this better by expanding the 
for loop, as follows: 


Action[] actions = new Action[3]; 


int i = 0; 

actions[0] = () => Console.Write (i); 

te 

actions[1] = () => Console.Write (i); 

Loe 2s 

actions[2] = () => Console.Write (i); 

5 Ae 

foreach (Action a in actions) a(); // 333 


The solution, if we want to write 012, is to assign the iteration variable to a local 
variable that’s scoped within the loop: 


Action[] actions = new Action[3]; 
for (int i = 0; i < 3; i++) 
{ 
int LoopScopedi = i; 
actions [i] = () => Console.Write (loopScopedi) ; 


} 


foreach (Action a in actions) a(); // 912 


Because loopScopedi is freshly created on every iteration, each closure captures a 
different variable. 
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Prior to C# 5.0, foreach loops worked in the same way: 
Action[] actions = new Action[3]; 


int i = 0; 


foreach (char c in "abc") 
actions [i++] = () => Console.Write (c); 


foreach (Action a in actions) a(); // ccc in C# 4.0 


This caused considerable confusion: unlike with a for loop, 
the iteration variable in a foreach loop is immutable, and so 
you would expect it to be treated as local to the loop body. The 
good news is that it’s been fixed since C# 5.0, and the preced- 
ing example now writes “abc”. 


Lambda Expressions Versus Local Methods 
The functionality of local methods (see “Local methods” on page 93 in Chapter 3) 


overlaps with that of lambda expressions. Local methods have the following three 
advantages: 

e They can be recursive (they can call themselves), without ugly hacks 

¢ They avoid the clutter of specifying a delegate type 

e They incur slightly less overhead 
Local methods are more efficient because they avoid the indirection of a delegate 
(which costs some CPU cycles and a memory allocation). They can also access local 


variables of the containing method without the compiler having to “hoist” the cap- 
tured variables into a hidden class. 


However, in many cases you need a delegate—most commonly when calling a 
higher-order function, that is, a method with a delegate-typed parameter: 


public void Foo (Func<int,bool> predicate) { ... } 


(You can see plenty more of these in Chapter 8). In such cases, you need a delegate 
anyway, and it’s in precisely these cases that lambda expressions are usually terser 
and cleaner. 


Anonymous Methods 


Anonymous methods are a C# 2.0 feature was mostly subsumed by C# 3.0’s lambda 
expressions. An anonymous method is like a lambda expression, but it lacks the fol- 
lowing features: 

¢ Implicitly typed parameters 

e Expression syntax (an anonymous method must always be a statement block) 


¢ The ability to compile to an expression tree, by assigning to Expression<T> 





Anonymous Methods | 169 


> 
= 
Qo 
HS 
Qa 
o 
ror 





To write an anonymous method, you include the delegate keyword followed 
(optionally) by a parameter declaration and then a method body. For example, given 
this delegate: 


delegate int Transformer (int i); 
we could write and call an anonymous method as follows: 


Transformer sqr = delegate (int x) {return x * x3}; 
Console.WriteLine (sqr(3)); // 9 


The first line is semantically equivalent to the following lambda expression: 


Transformer sqr = (int x) => {return x * x3}; 
Or, simply: 
Transformer sqr = xX => xX * xX; 


Anonymous methods capture outer variables in the same way lambda expressions 
do. 


A unique feature of anonymous methods is that you can omit 
the parameter declaration entirely—even if the delegate 
expects it. This can be useful in declaring events with a default 
empty handler: 

public event EventHandler Clicked = delegate { }; 
This avoids the need for a null check before firing the event. 
The following is also legal: 


// Notice that we omit the parameters: 
Clicked += delegate { Console.WriteLine ("clicked"); }; 


try Statements and Exceptions 


A try statement specifies a code block subject to error-handling or cleanup code. 
The try block must be followed by one or more catch blocks, a finally block, or 
both. The catch block executes when an error is thrown in the try block. The 
finally block executes after execution leaves the try block (or if present, the catch 
block), to perform cleanup code, regardless of whether an exception was thrown. 


A catch block has access to an Exception object that contains information about 
the error. You use a catch block to either compensate for the error or rethrow the 
exception. You rethrow an exception if you merely want to log the problem or if you 
want to rethrow a new, higher-level exception type. 


A finally block adds determinism to your program: the CLR endeavors to always 
execute it. It’s useful for cleanup tasks such as closing network connections. 


A try statement looks like this: 


try 
{ 


... // exception may get thrown within execution of this block 


} 





170 | Chapter 4: Advanced C# 


catch (ExceptionA ex) 


{ 
... // handle exception of type ExceptionA 


} 


catch (ExceptionB ex) 


... // handle exception of type ExceptionB 


} 

finally 

{ 

... // cleanup code 
} 
Consider the following program: 

class Test 

{ 


static int Calc (int x) => 10 / x; 


static void Main() 


{ 
int y = Calc (0); 
Console.WriteLine (y); 
} 
} 


Because x is zero, the runtime throws a DivideByZeroException, and our program 
terminates. We can prevent this by catching the exception as follows: 


class Test 


{ 
static int Calc (int x) => 10 / x; 


static void Main() 
{ 
try 
{ 
int y = Calc (0); 
Console.WriteLine (y); 
} 
catch (DivideByZeroException ex) 
{ 
Console.WriteLine ("x cannot be zero"); 
} 
Console.WriteLine ("program completed"); 
} 
} 


OUTPUT: 
x cannot be zero 
program completed 
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This is a simple example to illustrate exception handling. We 
could deal with this particular scenario better in practice by 
checking explicitly for the divisor being zero before calling 
Calc. 

Checking for preventable errors is preferable to relying on 
try/catch blocks because exceptions are relatively expensive 
to handle, taking hundreds of clock cycles or more. 


When an exception is thrown within a try statement, the CLR performs a test: 


Does the try statement have any compatible catch blocks? 


e Ifso, execution jumps to the compatible catch block, followed by the finally 
block (if present), and then execution continues normally. 


If not, execution jumps directly to the finally block (if present), then the CLR 
looks up the call stack for other try blocks; if found, it repeats the test. 


If no function in the call stack takes responsibility for the exception, the program 
terminates. 


The catch Clause 


A catch clause specifies what type of exception to catch. This must either be 
System. Exception or a subclass of System. Exception. 


Catching System. Exception catches all possible errors. This is useful in the follow- 
ing circumstances: 


e Your program can potentially recover regardless of the specific exception type. 
¢ You plan to rethrow the exception (perhaps after logging it). 


e Your error handler is the last resort, prior to termination of the program. 


More typically, though, you catch specific exception types in order to avoid having to 
deal with circumstances for which your handler wasn't designed (e.g., an 
OutOfMemoryException). 


You can handle multiple exception types with multiple catch clauses (again, this 
example could be written with explicit argument checking rather than exception 
handling): 


class Test 


{ 


static void Main (string[] args) 


{ 
try 


byte b = byte.Parse (args[0]); 
Console.WriteLine (b); 


} 
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catch (IndexOutOfRangeException ex) 


{ 
} 


catch (FormatException ex) 


{ 
} 


catch (OverflowException ex) 


{ 


Console.WriteLine ("You've given me more than a byte!"); 
} 


i 
i 
Only one catch clause executes for a given exception. If you want to include a safety 
net to catch more general exceptions (such as System.Exception), you must put the 
more-specific handlers first. 


Console.WriteLine ("Please provide at least one argument"); 


Console.WriteLine ("That's not a number!"); 


An exception can be caught without specifying a variable, if you don’t need to access 
its properties: 


catch (OverflowException) // no variable 


{ 
Mad 


Furthermore, you can omit both the variable and the type (meaning that all excep- 
tions will be caught): 


catch { ... } 


Exception filters 
You can specify an exception filter in a catch clause by adding a when clause: 


catch (WebException ex) when (ex.Status == WebExceptionStatus. Timeout) 


{ 
Ma 


If a WebException is thrown in this example, the Boolean expression following the 
when keyword is then evaluated. If the result is false, the catch block in question is 
ignored and any subsequent catch clauses are considered. With exception filters, it 
can be meaningful to catch the same exception type again: 


catch (WebException ex) when (ex.Status == WebExceptionStatus. Timeout) 

G sgccse 

catch (WebException ex) when (ex.Status == WebExceptionStatus.SendFailure) 
G see 


The Boolean expression in the when clause can be side-effecting, as with a method 
that logs the exception for diagnostic purposes. 








try Statements and Exceptions | 173 


> 
= 
ag 
HS 
Qa 
o 
ror 


The finally Block 


A finally block always executes—regardless of whether an exception is thrown 
and whether the try block runs to completion. You typically use finally blocks for 
cleanup code. 


A finally block executes after any of the following: 


e A catch block finishes (or throws a new exception) 


e The try block finishes (or throws an exception for which there’s no catch 
block) 


e Control leaves the try block because of a jump statement (e.g., return or goto) 


The only things that can defeat a finally block are an infinite loop or the process 
ending abruptly. 
A finally block helps add determinism to a program. In the following example, the 
file that we open always gets closed, regardless of whether: 

¢ The try block finishes normally 

¢ Execution returns early because the file is empty (EndOf Stream) 


e An I0Exception is thrown while reading the file 


static void ReadFile() 


{ 
StreamReader reader = null; // In System.I0 namespace 
try 
{ 
reader = File.OpenText ("file.txt"); 
if (reader.EndOfStream) return; 
Console.WriteLine (reader .ReadToEnd()); 
} 
finally 
{ 
if (reader != null) reader.Dispose(); 
} 
t 


In this example, we closed the file by calling Dispose on the StreamReader. Calling 
Dispose on an object, within a finally block, is a standard convention through- 
out .NET Core and is supported explicitly in C# through the using statement. 


The using statement 


Many classes encapsulate unmanaged resources, such as file handles, graphics han- 
dles, or database connections. These classes implement System. IDisposable, which 
defines a single parameterless method named Dispose to clean up these resources. 
The using statement provides an elegant syntax for calling Dispose on an 
IDisposable object within a finally block. Thus: 
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using (StreamReader reader = File.OpenText ("file.txt")) 


{ 
$ 
is precisely equivalent to: 

{ 
StreamReader reader = File.OpenText ("file.txt"); 
try 
{ 
} 
finally 
{ 

if (reader != null) 
((IDisposable)reader) .Dispose(); 

} 

} 


using declarations (C# 8) 


If you omit the brackets and statement block following a using statement, it 
becomes a using declaration. The resource is then disposed when execution falls out- 
side the enclosing statement block: 


if (File.Exists ("file.txt")) 
{ 


using var reader = File.OpenText ("file.txt"); 
Console.WriteLine (reader .ReadLine()); 


on 


In this case, reader will be disposed when execution falls outside the if statement 
block. 


Throwing Exceptions 


Exceptions can be thrown either by the runtime or in user code. In this example, 
Display throws a System. ArgumentNulLException: 


class Test 


{ 


static void Display (string name) 


{ 
if (name == null) 
throw new ArgumentNullException (nameof (name)); 


Console.WriteLine (name); 


} 


static void Main() 
{ 
try { Display (null); } 
catch (ArgumentNullException ex) 
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{ 


Console.WriteLine ("Caught the exception"); 
} 
} 
} 


throw expressions 

throw can also appear as an expression in expression-bodied functions: 
public string Foo() => throw new NotImplementedException(); 

A throw expression can also appear in a ternary conditional expression: 


string ProperCase (string value) => 
value == null ? throw new ArgumentException ("value") : 
value <= "" 200" 


char.ToUpper (value[0]) + value.Substring (1); 


Rethrowing an exception 


You can capture and rethrow an exception as follows: 


try {awa 
catch (Exception ex) 
{ 


// Log error 


throw; // Rethrow same exception 


} 


If we replaced throw with throw ex, the example would still 
work, but the StackTrace property of the newly propagated 
exception would no longer reflect the original error. 


Rethrowing in this manner lets you log an error without swallowing it. It also lets 
you back out of handling an exception should circumstances turn out to be beyond 
what you expected: 


using System.Net; // (See Chapter 16) 


string s = null; 
using (WebClient wc = new WebClient()) 
try { s = wc.DownloadString ("http://www.albahari.com/nutshell/"); } 
catch (WebException ex) 
{ 
if (ex.Status == WebExceptionStatus. Timeout) 
Console.WriteLine ("Timeout"); 
Else 
throw; // Can't handle other sorts of WebException, so rethrow 


} 


This can be written more tersely with an exception filter: 





176 | Chapter 4: Advanced C# 


catch (WebException ex) when (ex.Status == WebExceptionStatus. Timeout) 


{ 
Console.WriteLine ("Timeout"); 
} 
The other common scenario is to rethrow a more specific exception type: 
try 
{ 
... // Parse a DateTime from XML element data 
} 
catch (FormatException ex) 
{ 
throw new XmlException ("Invalid DateTime", ex); 
} 


Notice that when we constructed XmlException, we passed in the original excep- 
tion, ex, as the second argument. This argument populates the InnerException 
property of the new exception and aids debugging. Nearly all types of exception 
offer a similar constructor. 


Rethrowing a less-specific exception is something you might do when crossing a 
trust boundary, so as not to leak technical information to potential hackers. 


Key Properties of System.Exception 


The most important properties of System. Exception are the following: 


StackTrace 
A string representing all the methods that are called from the origin of the 
exception to the catch block. 


Message 
A string with a description of the error. 


InnerException 
The inner exception (if any) that caused the outer exception. This, itself, can 
have another InnerException. 


All exceptions in C# are runtime exceptions—there is no 
equivalent to Java's compile-time checked exceptions. 


Common Exception Types 


The following exception types are used widely throughout the CLR and .NET Core. 
You can throw these yourself or use them as base classes for deriving custom excep- 
tion types. 


System.ArgumentException 
Thrown when a function is called with a bogus argument. This generally indi- 
cates a program bug. 
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System. ArgumentNulLException 
Subclass of ArgumentException that’s thrown when a function argument is 
(unexpectedly) null. 


System.ArgumentOutOfRangeException 
Subclass of ArgumentException that’s thrown when a (usually numeric) argu- 
ment is too big or too small. For example, this is thrown when passing a nega- 
tive number into a function that accepts only positive values. 


System. InvaLlidOperationException 
Thrown when the state of an object is unsuitable for a method to successfully 
execute, regardless of any particular argument values. Examples include read- 
ing an unopened file or getting the next element from an enumerator for which 
the underlying list has been modified partway through the iteration. 


System.NotSupportedException 
Thrown to indicate that a particular functionality is not supported. A good 
example is calling the Add method on a collection for which IsReadOnly 
returns true. 


System.NotImplementedException 
Thrown to indicate that a function has not yet been implemented. 


System.ObjectDisposedException 
Thrown when the object upon which the function is called has been disposed. 


Another commonly encountered exception type is NuLLReferenceException. The 
CLR throws this exception when you attempt to access a member of an object 
whose value is null (indicating a bug in your code). You can throw a Null 
ReferenceException directly (for testing purposes) as follows: 


throw null; 


The Try XXX Method Pattern 


When writing a method, you have a choice, when something goes wrong, to return 
some kind of failure code or throw an exception. In general, you throw an exception 
when the error is outside the normal workflow—or if you expect that the immediate 
caller won't be able to cope with it. Occasionally, though, it can be best to offer both 
choices to the consumer. An example of this is the int type, which defines two ver- 
sions of its Parse method: 


public int Parse (string input); 
public bool TryParse (string input, out int returnValue); 


If parsing fails, Parse throws an exception; TryParse returns false. 

You can implement this pattern by having the XxX method call the TryXXxXx method: 
public return-type XXX (input-type input) 
{ 


return-type returnValue; 
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Uf (!TryXxx (input, out returnValue)) 
throw new YYYException (...) 
return returnValue; 


} 


Alternatives to Exceptions 


As with int.TryParse, a function can communicate failure by sending an error 
code back to the calling function via a return type or parameter. Although this can 
work with simple and predictable failures, it becomes clumsy when extended to all 
errors, polluting method signatures and creating unnecessary complexity and clut- 
ter. It also cannot generalize to functions that are not methods, such as operators 
(e.g., the division operator) or properties. An alternative is to place the error in a 
common place where all functions in the call stack can see it (e.g., a static method 
that stores the current error per thread). This, though, requires each function to 
participate in an error-propagation pattern, which is cumbersome and, ironically, 
itself error prone. 


Enumeration and Iterators 


Enumeration 


An enumerator is a read-only, forward-only cursor over a sequence of values. C# 
treats a type as an enumerator if it does any of the following: 


e Implements System.Collections. IEnumerator 


e Implements System.Collections. Generic. IEnumerator<T> 


e Has a public parameterless method named MoveNext and property called 
Current 


The foreach statement iterates over an enumerable object. An enumerable object is 
the logical representation of a sequence. It is not itself a cursor, but an object that 
produces cursors over itself. C# treats a type as enumerable if it does any of the 
following: 


¢ Implements System.Collections.IEnumerable 
e Implements System.Collections. Generic. IEnumerable<T> 
e Has a public parameterless method named GetEnumerator that returns an 
enumerator 
The enumeration pattern is as follows: 


class Enumerator // Typically implements IEnumerator or IEnumerator<T> 
{ 

public IteratorVariableType Current { get {...} } 

public bool MoveNext() {...} 
} 
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class Enumerable // Typically implements IEnumerable or IEnumerable<T> 


{ 


public Enumerator GetEnumerator() {...} 


i 
Here is the high-level way of iterating through the characters in the word beer using 


a foreach statement: 


foreach (char c in "beer") 
Console.WriteLine (c); 


Here is the low-level way of iterating through the characters in beer without using a 
foreach statement: 


using (var enumerator = "beer".GetEnumerator() ) 
while (enumerator .MoveNext() ) 
{ 


var element = enumerator.Current; 
Console.WriteLine (element); 


} 


If the enumerator implements IDisposable, the foreach statement also acts as a 
using statement, implicitly disposing the enumerator object. 


Chapter 7 explains the enumeration interfaces in further detail. 


Collection Initializers 
You can instantiate and populate an enumerable object in a single step: 


using System.Collections.Generic; 


List<int> list = new List<int> {1, 2, 3}; 
The compiler translates this to the following: 


using System.Collections.Generic; 


List<int> list = new List<int>(); 
list.Add (1); 
list.Add (2); 
list.Add (3); 


This requires that the enumerable object implements the System 
.Collections.IEnumerable interface, and that it has an Add method that has the 
appropriate number of parameters for the call. You can similarly initialize dictionar- 
ies (see “Dictionaries” on page 344 in Chapter 7) as follows: 


var dict = new Dictionary<int, string>() 
{ 

{ 5, "five" },; 

{ 10, "ten" } 
3; 
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Or, more succinctly: 


var dict = new Dictionary<int, string>() 
{ 
[3] = "three", 
[10] = "ten" 
t3 
The latter is valid not only with dictionaries, but with any type for which an indexer 
exists. 


Iterators 


Whereas a foreach statement is a consumer of an enumerator, an iterator is a pro- 
ducer of an enumerator. In this example, we use an iterator to return a sequence of 
Fibonacci numbers (where each number is the sum of the previous two): 


using System; 
using System.Collections.Generic; 


class Test 
{ 
static void Main() 
{ 
foreach (int fib in Fibs(6)) 
Console.Write (fib+" "); 
} 
static IEnumerable<int> Fibs (int fibCount) 
{ 
for (int i = 0, prevFib = 1, curFib = 1; i < fibCount; i++) 
{ 
yield return prevFib; 
int newFib = prevFib + curFib; 
prevFib = curFib; 
curFib = newFib; 
} 
} 
} 


OUTPUT: 1 1 2 3 5 8 


Whereas a return statement expresses “Here’s the value you asked me to return 
from this method,” a yield return statement expresses “Here’s the next element 
you asked me to yield from this enumerator” On each yield statement, control is 
returned to the caller, but the callee’s state is maintained so that the method can con- 
tinue executing as soon as the caller enumerates the next element. The lifetime of 
this state is bound to the enumerator such that the state can be released when the 
caller has finished enumerating. 
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The compiler converts iterator methods into private classes 
that implement IEnumerable<T> and/or IEnumerator<T>. The 
logic within the iterator block is “inverted” and spliced into 
the MoveNext method and Current property on the compiler- 
written enumerator class. This means that when you call an 
iterator method, all you're doing is instantiating the compiler- 
written class; none of your code actually runs! Your code runs 
only when you start enumerating over the resultant sequence, 
typically with a foreach statement. 


Iterators can be local methods (see “Local methods” on page 93 in Chapter 3). 


Iterator Semantics 


An iterator is a method, property, or indexer that contains one or more yield state- 
ments. An iterator must return one of the following four interfaces (otherwise, the 
compiler will generate an error): 


// Enumerable interfaces 
System.Collections.IEnumerable 
System.Collections.Generic. IEnumerable<T> 


// Enumerator interfaces 
System.Collections. IEnumerator 
System.Collections.Generic. IEnumerator<T> 


An iterator has different semantics, depending on whether it returns an enumerable 
interface or an enumerator interface. We describe this in Chapter 7. 


Multiple yield statements are permitted: 


class Test 


{ 


static void Main() 


foreach (string s in Foo()) 
Console.WriteLine(s); // Prints "One","Two","Three" 


} 


static IEnumerable<string> Foo() 


{ 
yield return "One"; 
yield return "Two"; 
yield return "Three"; 
} 
} 


yield break 


A return statement is illegal in an iterator block; instead you must use the yield 
break statement to indicate that the iterator block should exit early, without return- 
ing more elements. We can modify Foo as follows to demonstrate: 
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static IEnumerable<string> Foo (bool breakEarly) 


{ 


yield return "One"; 
yield return "Two"; 


if (breakEarly) 
yield break; 


yield return "Three"; 


} 


Iterators and try/catch/finally blocks 
A yield return statement cannot appear ina try block that has a catch clause: 


TEnumerable<string> Foo() 


{ 
try { yield return "One"; } // Illegal 


catch { ... } 
} 


Nor can yield return appear in a catch or finally block. These restrictions are 
due to the fact that the compiler must translate iterators into ordinary classes with 
MoveNext, Current, and Dispose members, and translating exception handling 
blocks would create excessive complexity. 
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You can, however, yield within a try block that has (only) a finally block: 


TEnumerable<string> Foo() 


{ 
try { yield return "One"; } // OK 


finally { ... } 
} 


The code in the finally block executes when the consuming enumerator reaches 
the end of the sequence or is disposed. A foreach statement implicitly disposes the 
enumerator if you break early, making this a safe way to consume enumerators. 
When working with enumerators explicitly, a trap is to abandon enumeration early 
without disposing it, circumventing the finally block. You can avoid this risk by 
wrapping explicit use of enumerators in a using statement: 


string firstElement = null; 
var sequence = Foo(); 
using (var enumerator = sequence.GetEnumerator()) 
if (enumerator .MoveNext()) 
firstElement = enumerator.Current; 


Composing Sequences 


Iterators are highly composable. We can extend our example, this time to output 
even Fibonacci numbers only: 


using System; 
using System.Collections.Generic; 
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class Test 


{ 


static void Main() 
{ 
foreach (int fib in EvenNumbersOnly (Fibs(6))) 
Console.WriteLine (fib); 


} 


static IEnumerable<int> Fibs (int fibCount) 
{ 
for (int i = 0, prevFib = 1, curFib = 1; i < fibCount; i++) 
{ 
yield return prevFib; 
int newFib = prevFib + curFib; 
prevFib = curFib; 
curFib = newFib; 
} 
} 


static IEnumerable<int> EvenNumbersOnly (IEnumerable<int> sequence) 


{ 


foreach (int x in sequence) 
if ((x % 2) == 0) 
yield return x; 
} 
} 


Each element is not calculated until the last moment—when requested by a 
MoveNext() operation. Figure 4-1 shows the data requests and output over time. 


The composability of the iterator pattern is extremely useful in LINQ; we discuss 
the subject again in Chapter 8. 
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Figure 4-1. Composing sequences 


Nullable Value Types 


Reference types can represent a nonexistent value with a null reference. Value types, 
however, cannot ordinarily represent null values: 


string s = null; // OK, reference type 
int i = null; // Compile error, value type cannot be null 


To represent null in a value type, you must use a special construct called a nullable 
type. A nullable type is denoted with a value type followed by the ? symbol: 


int? i = null; // OK, nullable type 
Console.WriteLine (i == null); // True 
Nullable<T> Struct 


T? translates into System.Nullable<T>, which is a lightweight immutable structure, 
having only two fields, to represent Value and HasValue. The essence of 
System.NuLlable<T> is very simple: 


public struct Nullable<T> where T : struct 
{ 
public T Value {get;} 
public bool HasValue {get;} 
public T GetValueOrDefault(); 
public T GetValueOrDefault (T defaultValue) ; 
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The code: 


int? 1 = null 
Console.WriteLine (i == null); // True 


translates to: 


Nullable<int> i = new Nullable<int>(); 
Console.WriteLine (! i.HasVaLue); // True 


Attempting to retrieve Value when HasValue is false throws an Invalid 
OperationException. GetValueOrDefault() returns Value if HasValue is true; 
otherwise, it returns new T() or a specified custom default value. 


The default value of T? is null. 


Implicit and Explicit Nullable Conversions 
The conversion from T to T? is implicit, and from T? to T is explicit: 


int? x = 5; // implicit 
int y = (int)x; // explicit 


The explicit cast is directly equivalent to calling the nullable object’s Value property. 
Hence, an InvalidOperationException is thrown if HasValue is false. 


Boxing and Unboxing Nullable Values 


When T? is boxed, the boxed value on the heap contains T, not T?. This optimiza- 
tion is possible because a boxed value is a reference type that can already express 
null. 


C# also permits the unboxing of nullable value types with the as operator. The 
result will be null if the cast fails: 
object o = "string"; 


int? x = 0 as int?; 
Console.WriteLine (x.HasVaLue); // False 


Operator Lifting 


The Nullable<T> struct does not define operators such as <, >, or even ==. Despite 
this, the following code compiles and executes correctly: 


Unt? «= 53 
int? y = 10; 
bool b = x < y; // true 


This works because the compiler borrows or lifts the less-than operator from the 
underlying value type. Semantically, it translates the preceding comparison expres- 
sion into this: 


bool b = (x.HasValue && y.HasValue) ? (x.Value < y.Value) : false; 
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In other words, if both x and y have values, it compares via int’s less-than operator; 
otherwise, it returns false. 


Operator lifting means that you can implicitly use T’s operators on T?. You can 
define operators for T? in order to provide special-purpose null behavior, but in the 
vast majority of cases, it’s best to rely on the compiler automatically applying sys- 
tematic nullable logic for you. Here are some examples: 


int? x = 5; 
int? y = null; 


// Equality operator examples 

Console.WriteLine (x == y); // False 
Console.WriteLine (x == null); // False 
Console.WriteLine (x == 5); // True 
Console.WriteLine (y == null); // True 
Console.WriteLine (y == 5); // False 
Console.WriteLine (y != 5); // True 





// Relational operator examples > 

Console.WriteLine (x < 6); // True 2 

Console.WriteLine (y < 6); // False ¢ o 

Console.WriteLine (y > 6); // False fa) 
Q 

// All other operator examples 

Console.WriteLine (x + 5); // 10 

Console.WriteLine (x + y); // null (prints empty line) 


The compiler performs null logic differently depending on the category of operator. 
The following sections explain these different rules. 


Equality operators (== and !=) 


Lifted equality operators handle nulls just like reference types do. This means that 
two null values are equal: 


Console.WriteLine ( null == null); // True 
Console.WriteLine ((bool?)null == (bool?)null); // True 


Further: 


¢ If exactly one operand is null, the operands are unequal. 


¢ If both operands are non-null, their Values are compared. 


Relational operators (<, <=, >=, >) 


The relational operators work on the principle that it is meaningless to compare null 
operands. This means that comparing a null value to either a null or a non-null 
value returns false: 
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bool b = x < y; // Translation: 


bool b = (x.HasValue && y.HasValue) 
? (x.Value < y.Value) 
: false; 


// bis false (assuming x is 5 and y is null) 


All other operators (+, —, *, /, %, &, |, 4, <<, >>, +, ++,-+,!, ~) 


These operators return null when any of the operands are null. This pattern should 
be familiar to SQL users: 


int? c=x+y;3 // Translation: 


int? c = (x.HasValue && y.HasValue) 
? (int?) (x.Value + y.Value) 
¢ nulls 


// c is null (assuming x is 5 and y is null) 


An exception is when the & and | operators are applied to bool?, which we discuss 
shortly. 


Mixing nullable and non-nullable operators 


You can mix and match nullable and non-nullable value types (this works because 
there is an implicit conversion from T to T?): 


int? a = null; 
int b = 2; 
int? c=a+b; // c is null - equivalent to a + (int?)b 


bool? with & and | Operators 


When supplied operands of type bool? the & and | operators treat null as an 
unknown value. So, null | true is true because: 


e Ifthe unknown value is false, the result would be true. 


e Ifthe unknown value is true, the result would be true. 


Similarly, null & false is false. This behavior would be familiar to SQL users. The 
following example enumerates other combinations: 


bool? n = null; 
bool? f = false; 
bool? t = true; 
Console.WriteLine (n | 
Console.WriteLine (n | 
Console.WriteLine (n | t); // True 
Console.WriteLine (n & n); // (null) 
Console.WriteLine (n & f); // False 
Console.WriteLine (n & t); // (null) 


n);  // (null) 
Ff); // (null) 
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Nullable Value Types & Null Operators 


Nullable value types work particularly well with the ?? operator (see “Null- 
Coalescing Operator” on page 69 in Chapter 2), as illustrated in this example: 


int? x = null; 
int y =x ?? 5; /i/ yis 5 


int? a = null, b = 1, ¢ = 2; 

Console.WriteLine (a ?? b ?? c); // 1 (first non-null value) 
Using ?? on a nullable value type is equivalent to calling GetValueOrDefault with 
an explicit default value, except that the expression for the default value is never 
evaluated if the variable is not null. 


Nullable value types also work well with the null-conditional operator (see “Null- 
Conditional Operator” on page 69 in Chapter 2). In the following example, Length 
evaluates to null: 


System. Text.StringBuilder sb = null; 

int? Length = sb?.ToString().Length; 
We can combine this with the null-coalescing operator to evaluate to zero instead of 
null: 


int length = sb?.ToString().Length ?? 0; // Evaluates to 0 if sb is null 


Scenarios for Nullable Value Types 


One of the most common scenarios for nullable value types is to represent 
unknown values. This frequently occurs in database programming, where a class is 
mapped to a table with nullable columns. If these columns are strings (e.g., an 
EmailAddress column on a Customer table), there is no problem because string is 
a reference type in the CLR, which can be null. However, most other SQL column 
types map to CLR struct types, making nullable value types very useful when map- 
ping SQL to the CLR: 


// Maps to a Customer table in a database 
public class Customer 


{ 


public decimal? AccountBalance; 


} 


A nullable type can also be used to represent the backing field of what's sometimes 
called an ambient property. An ambient property, if null, returns the value of its 
parent: 


public class Row 


{ 


Grid parent; 
Color? color; 
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public Color Color 
{ 
get { return color ?? parent.Color; } 
set { color = value == parent.Color ? (Color?)null : value; } 
} 
} 


Alternatives to Nullable Value Types 


Before nullable value types were part of the C# language (i.e., before C# 2.0), there 
were many strategies to deal with nullable value types, examples of which still 
appear in .NET Core for historical reasons. One strategy is to designate a particular 
non-null value as the “null value”; an example is in the string and array classes. 
string. IndexOf returns the magic value of -1 when the character is not found: 

int i = "Pink". IndexOf ('b'); 

Console.WriteLine (i); // -1 
However, Array. IndexOf returns -1 only if the index is 0-bounded. The more gen- 
eral formula is that IndexOf returns one less than the lower bound of the array. In 
the next example, IndexOf returns 0 when an element is not found: 


// Create an array whose lower bound is 1 instead of 0: 


Array a = Array.CreateInstance (typeof (string), 
new int[] {2}, new int[] {1}); 
a.SetValue ("a", 1); 
a.SetValue ("b", 2); 
Console.WriteLine (Array.IndexOf (a, "c")); // 0 


Nominating a “magic value” is problematic for several reasons: 


It means that each value type has a different representation of null. In contrast, 
nullable value types provide one common pattern that works for all value types. 


e There might be no reasonable designated value. In the previous example, —1 
could not always be used. The same is true for our earlier example representing 
an unknown account balance. 


Forgetting to test for the magic value results in an incorrect value that might go 
unnoticed until later in execution—when it pulls an unintended magic trick. 
Forgetting to test HasValue on a null value, however, throws an Invalid 
OperationException on the spot. 


The ability for a value to be null is not captured in the type. Types communi- 
cate the intention of a program, allow the compiler to check for correctness, 
and enable a consistent set of rules enforced by the compiler. 
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Nullable Reference Types (C# 8) 


Whereas nullable value types bring nullability to value types, nullable reference types 
do the opposite and bring (a degree of) non-nullability to reference types, with the 
purpose of helping to avoid NullReferenceExceptions. 


Nullable reference types introduce a level of safety that’s enforced purely by the 
compiler, in the form of warnings when it detects code that’s at risk of generating a 
NullReferenceException. 


To enable nullable reference types, you must either add the Nullable element to 
your .csproj project file (if you want to enable it for the entire project): 


<PropertyGroup> 
<Nullable>enable</Nullable> 
</PropertyGroup> 
and/or use the following directives in your code, in the places where it should take 
effect: 


#nullable enable // enables nullable reference types from this point on 
#nuLllable disable // disables nullable reference types from this point on 
#nullable restore // resets nullable reference types to project setting 


After being enabled, the compiler makes non-nullability the default: if you want a 
reference type to accept nulls, you must apply the ? suffix to indicate a nullable refer- 
ence type. In the following example, s1 is non-nullable, whereas s2 is nullable: 


#nuLllable enable // Enable nullable reference types 


string s1 = null; // Generates a compiler warning! 
string? s2 = null; // OK: s2 is nullable reference type 


Because nullable reference types are compile-time constructs, 
there’s no runtime difference between string and string?. In 
contrast, nullable value types introduce something concrete 
into the type system, namely the Nullable<T> struct. 


The following also generates a warning because x is not initialized: 
class Foo { string x; } 
The warning disappears if you initialize x, either via a field initializer, or via code in 
the constructor. 
The Null-Forgiving Operator 


The compiler also warns you upon dereferencing a nullable reference type, if it 
thinks a NulLReferenceException might occur. In the following example, accessing 
the string’s Length property generates a warning: 


void Foo (string? s) => Console.Write (s.Length) ; 
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You can remove the warning with the null-forgiving operator (!): 
void Foo (string? s) => Console.Write (s!.Length); 


Our use of the null-forgiving operator in this example is dangerous in that we could 
end up throwing the very NullLReferenceException we were trying to avoid in the 
first place. We could fix it as follows: 


void Foo (string? s) 


{ 
if (s != null) Console.Write (s.Length); 


I 
Notice that now we dont need the null-forgiving operator. This is because the com- 
piler performs static flow analysis and is smart enough to infer—at least in simple 
cases—when a dereference is safe and theres no chance of a Null 
ReferenceException. 


The compiler’s ability to detect and warn is not bulletproof, and there are also limits 
to what's possible in terms of coverage. For instance, the compiler is unable to know 
whether an array’s elements have been populated, and so the following does not 
generate a warning: 


var strings = new string[10]; 
Console.WriteLine (strings[0].Length); 


Separating the Annotation and Warning Contexts 


Enabling nullable reference types via the #nullable enable directive (or the 
<NuLlable>enable</Nullable> project setting) does two things: 


e It enables the nullable annotation context, which tells the compiler to treat all 
reference-type variable declarations as non-nullable unless suffixed by the ? 
symbol. 


e It enables the nullable warning context, which tells the compiler to generate 
warnings upon encountering code at risk of throwing a Null 
ReferenceException. 


It can sometimes be useful to separate these two concepts and enable just the anno- 
tation context, or (less usefully) just the warning context: 


#nuLllable enable annotations // Enable the annotation context 
// OR: 
#nuLllable enable warnings // Enable the warning context 


(The same trick works with #nullable disable and #nullable restore.) 
You can also do it via the project file: 
<NuLlable>annotations</Nullable> 


4}-+ OR -<> 
<Nullable>warnings</Nullable> 
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Enabling just the annotation context for a particular class or assembly can be a good 
first step in introducing nullable reference types into a legacy codebase. By correctly 
annotating public members, your class or assembly can act as a good citizen to other 
classes or assemblies—so that they can benefit fully from nullable reference types— 
without having to deal with warnings in your own class or assembly. 


Treating Nullable Warnings as Errors 


In greenfield projects, it makes sense to fully enable the nullable context from the 
outset. You might want to take the additional step of treating nullable warnings as 
errors so that your project cannot compile until all null-warnings have been 
resolved: 


<PropertyGroup> 
<Nullable>enable</Nullable> 
<WarningsAsErrors>CS8600; CS8602 ; CS8603</WarningsAsErrors> 
</PropertyGroup> 


Extension Methods 


Extension methods allow an existing type to be extended with new methods without 
altering the definition of the original type. An extension method is a static method 
of a static class, where the this modifier is applied to the first parameter. The type 
of the first parameter will be the type that is extended: 
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public static class StringHelper 


{ 
public static bool IsCapitalized (this string s) 


{ 
if (string.IsNullOrEmpty(s)) return false; 


return char.IsUpper (s[0]); 
} 
} 


The IsCapitalized extension method can be called as though it were an instance 
method on a string, as follows: 


Console.WriteLine ("Perth".IsCapitalized()); 


An extension method call, when compiled, is translated back into an ordinary static 
method call: 


Console.WriteLine (StringHelper.IsCapitalized ("Perth")); 
The translation works as follows: 


arg0.Method (arg1, arg2, ...); // Extension method call 
StaticClass.Method (arg®, argi, arg2, ...); // Static method call 


Interfaces can be extended, too: 


public static T First<T> (this IEnumerable<T> sequence) 


{ 


foreach (T element in sequence) 
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return element; 


throw new InvalidOperationException ("No elements!"); 


} 


Console.WriteLine ("Seattle".First()); // Ss 


Extension Method Chaining 


Extension methods, like instance methods, provide a tidy way to chain functions. 
Consider the following two functions: 


public static class StringHelper 


{ 
public static string Pluralize (this string s) {...} 


public static string Capitalize (this string s) {...} 
} 


x and y are equivalent, and both evaluate to "Sausages", but x uses extension meth- 
ods, whereas y uses static methods: 


string x = "sausage".Pluralize().Capitalize(); 
string y = StringHelper.Capitalize (StringHelper.Pluralize ("sausage")); 


Ambiguity and Resolution 


Namespaces 


An extension method cannot be accessed unless its class is in scope, typically by its 
namespace being imported. Consider the extension method IsCapitalized in the 
following example: 


using System; 


namespace Utils 


{ 
public static class StringHelper 
{ 
public static bool IsCapitalized (this string s) 
{ 
if (string.IsNullOrEmpty(s)) return false; 
return char.IsUpper (s[0]); 
} 
} 
} 


To use IsCapitalized, the following application must import Utils in order to 
avoid a compile-time error: 


Namespace MyApp 
{ 


using Utils; 


class Test 


{ 
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static void Main() => Console.WriteLine ("Perth".IsCapitalized()); 


i 
} 


Extension methods versus instance methods 


Any compatible instance method will always take precedence over an extension 
method. In the following example, Test’s Foo method will always take precedence, 
even when called with an argument x of type int: 


class Test 


{ 
public void Foo (object x) { } // This method always wins 


} 


static class Extensions 


{ 
public static void Foo (this Test t, int x) { } 


} 


The only way to call the extension method in this case is via normal static syntax; in 
other words, Extensions.Foo(...). 
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Extension methods versus extension methods 


If two extension methods have the same signature, the extension method must be 
called as an ordinary static method to disambiguate the method to call. If one exten- 
sion method has more specific arguments, however, the more specific method takes 
precedence. 


To illustrate, consider the following two classes: 


static class StringHelper 


{ 

public static bool IsCapitalized (this string s) {...} 
} 
static class ObjectHelper 
{ 

public static bool IsCapitalized (this object s) {...} 
} 


The following code calls StringHelper’s IsCapitalized method: 
bool test1 = "Perth". IsCapitalized(); 


Classes and structs are considered more specific than interfaces. 


Anonymous Types 


An anonymous type is a simple class created by the compiler on the fly to store a set 
of values. To create an anonymous type, use the new keyword followed by an object 
initializer, specifying the properties and values the type will contain; for example: 


var dude = new { Name = "Bob", Age = 23 }; 
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The compiler translates this to (approximately) the following: 


internal class AnonymousGeneratedTypeName 


{ 
private string name; // Actual field name is irrelevant 
private int age; // Actual field name is irrelevant 


public AnonymousGeneratedTypeName (string name, int age) 


{ 


this.name = name; this.age = age; 


} 


public string Name { get { return name; } } 
public int Age { get { return age; } } 


// The Equals and GetHashCode methods are overridden (see Chapter 6). 
// The ToString method is also overridden. 


var dude = new AnonymousGeneratedTypeName ("Bob", 23); 


You must use the var keyword to reference an anonymous type because it doesn’t 
have a name. 


The property name of an anonymous type can be inferred from an expression that 
is itself an identifier (or ends with one); thus: 


int Age = 23; 
var dude = new { Name 


"Bob", Age, Age.ToString().Length }; 
is equivalent to: 
var dude = new { Name = "Bob", Age = Age, Length = Age.ToString().Length }; 


Two anonymous type instances declared within the same assembly will have the 
same underlying type if their elements are named and typed identically: 


var a1 = new { X = 2, Y = 4 }; 
var a2 = new { X = 2, Y = 4 }; 
Console.WriteLine (a1.GetType() == a2.GetType()); // True 


Additionally, the Equals method is overridden to perform equality comparisons: 


Console.WriteLine (a1 == a2); // False 
Console.WriteLine (a1.Equals (a2)); // True 


You can create arrays of anonymous types, as follows: 


var dudes = new[] 


{ 
new { Name = "Bob", Age 
new { Name = "Tom", Age 
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30 }, 
40 } 
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A method cannot (usefully) return an anonymously typed object, because it is illegal 
to write a method whose return type is var: 


var Foo() => new { Name = "Bob", Age = 30 }; // Not legal! 


Instead, you must use object or dynamic, and then whoever calls Foo must rely on 
dynamic binding, with loss of static type safety (and IntelliSense in Visual Studio). 


dynamic Foo() => new { Name = "Bob", Age = 30 }; // No static type safety. 


Anonymous types are particularly useful when writing LINQ queries (see 
Chapter 8). 


Tuples 


Like anonymous types, tuples provide a simple way to store a set of values. The 
main purpose of tuples is to safely return multiple values from a method without 
resorting to out parameters (something you cannot do with anonymous types). 


Tuples do almost everything that anonymous types do and 
more. Their one disadvantage—as you'll see soon—is runtime 
type erasure with named elements. 
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The simplest way to create a tuple literal is to list the desired values in parentheses. 
This creates a tuple with unnamed elements, which you refer to as Item1, Item2, and 
so on: 


var bob = ("Bob", 23); // Allow compiler to infer the element types 


Console.WriteLine (bob.Item1); // Bob 
Console.WriteLine (bob.Item2); // 23 


Tuples are value types, with mutable (read/write) elements: 


var joe = bob; // joe is a *copy* of bob 

joe.Item1 = "Joe"; // Change joe's Item1 from Bob to Joe 
Console.WriteLine (bob); // (Bob, 23) 

Console.WriteLine (joe); // (Joe, 23) 


Unlike with anonymous types, you can specify a tuple type explicitly. Just list each of 
the element types in parentheses: 


(string,int) bob = ("Bob", 23); 
This means that you can usefully return a tuple from a method: 


static (string,int) GetPerson() => ("Bob", 23); 


static void Main() 

{ 
(string,int) person = GetPerson(); // Could use 'var' instead if we want 
Console.WriteLine (person.Item1); // Bob 
Console.WriteLine (person.Item2); // 23 


} 
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Tuples play well with generics, so the following types are all legal: 
Task<(string,int)> 


Dictionary<(string, int) ,Uri> 
TEnumerable<(int id, string name)> // See below for naming elements 


Naming Tuple Elements 
You can optionally give meaningful names to elements when creating tuple literals: 


var tuple = (name:"Bob", age:23); 


Console.WriteLine (tuple.name) ; // Bob 
Console.WriteLine (tuple.age); // 23 


You can do the same when specifying tuple types: 
static (string name, int age) GetPerson() => ("Bob", 23); 


static void Main() 


{ 
var person = GetPerson(); 
Console.WriteLine (person.name) ; // Bob 
Console.WriteLine (person.age); J] 23 
} 


Note that you can still treat the elements as unnamed and refer to them as Item1, 
Item2, etc. (although Visual Studio hides these fields from IntelliSense). 


Element names are automatically inferred from property or field names: 


var now = DateTime.Now; 
var tuple = (now.Day, now.Month, now.Year); 
Console.WriteLine (tuple.Day); // OK 


Tuples are type compatible with one another if their element types match up (in 
order). Their element names need not: 


(string name, int age, char sex) bob1 = ("Bob", 23, 'M'); 
(string age, int sex, char name) bob2 = bob1; // No error! 


Our particular example leads to confusing results: 


Console.WriteLine (bob2.name); //™ 


Console.WriteLine (bob2.age); // Bob 
Console.WriteLine (bob2.sex); // 23 
Type erasure 


We stated previously that the C# compiler handles anonymous types by building 
custom classes with named properties for each of the elements. With tuples, C# 
works differently and uses a preexisting family of generic structs: 


public struct ValueTuple<T1> 
public struct ValueTuple<T1,T2> 
public struct ValueTuple<T1,12,1T3> 
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Each of the ValueTuple<> structs has fields named Item1, Item2, and so on. 


Hence, (string, int) is an alias for ValueTuple<string,int>, and this means that 
named tuple elements have no corresponding property names in the underlying 
types. Instead, the names exist only in the source code, and in the imagination of 
the compiler. At runtime, the names mostly disappear, so if you decompile a pro- 
gram that refers to named tuple elements, you'll see just references to Item1, Item2, 
and so on. Further, when you examine a tuple variable in a debugger after having 
assigned it to an object (or Dump it in LINQPad), the element names are not there. 
And for the most part, you cannot use reflection (Chapter 19) to determine a tuple’s 
element names at runtime. 


We said that the names mostly disappear because there's an 
exception. With methods/properties that return named tuple 
types, the compiler emits the element names by applying a 
custom attribute called TupleElementNamesAttribute (see 
“Attributes” on page 204) to the member's return type. This 
allows named elements to work when calling methods in a dif- 
ferent assembly (for which the compiler does not have the 
source code). 


ValueTuple.Create 


You can also create tuples via a factory method on the (nongeneric) ValueTuple 
type: 


ValueTuple<string,int> bob1 = ValueTuple.Create ("Bob", 23); 
(string, int) bob2 = ValueTuple.Create ("Bob", 23); 


You cannot create named elements in this way, because element naming relies on 
compiler magic. 


Deconstructing Tuples 


Tuples implicitly support the deconstruction pattern (see “Deconstructors” on page 
95 in Chapter 3), so you can easily deconstruct a tuple into individual variables. So, 
instead of doing this: 


var bob = ("Bob", 23); 


string name = bob.Item1; 
int age = bob.Item2; 


you can do this: 


var bob = ("Bob", 23); 


(string name, int age) = bob; // Deconstruct the bob tuple into 

// separate variables (name and age). 
Console.WriteLine (name); 
Console.WriteLine (age); 
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The syntax for deconstruction is confusingly similar to the syntax for declaring a 
tuple with named elements. The following highlights the difference: 


(string name, int age) = bob; // Deconstructing a tuple 
(string name, int age) bob2 = bob; // Declaring a new tuple 


Here's another example, this time when calling a method, and with type inference 
(var): 


static (string, int, char) GetBob() => ( "Bob", 23, 'M'); 


static void Main() 


{ 
var (name, age, sex) = GetBob(); 
Console.WriteLine (name); // Bob 
Console.WriteLine (age); // 23 
Console.WriteLine (sex); //™ 
} 
Equality Comparison 


As with anonymous types, the ValueTuple<> types override the Equals method to 
allow equality comparisons to work meaningfully: 


var ti = ("one", 1); 
var t2 = ("one", 1); 
Console.WriteLine (t1.Equals (t2)); // True 


In addition, ValueTuple<> overloads the == and != operators: 
Console.WriteLine (t1 == t2); // True (from C# 7.3) 


They also override the GetHashCode method, making it practical to use tuples as 
keys in dictionaries. We cover equality comparison in detail in “Equality Compari- 
son’ on page 296 in Chapter 6, and “Dictionaries” on page 344 in Chapter 7. 


The ValueTuple<> types also implement IComparable (see “Order Comparison” on 
page 306 in Chapter 6), making it possible to use tuples as a sorting key. 


The System.Tuple classes 


You'll find another family of generic types in the System namespace called Tuple 
(rather than ValueTuple). These were introduced in .NET Framework 4.0 and are 
classes (whereas the ValueTuple types are structs). Defining tuples as classes was in 
retrospect considered a mistake: in the typical scenarios in which tuples are used, 
structs have a slight performance advantage (in that they avoid unnecessary mem- 
ory allocations), with almost no downside. Hence, when Microsoft added language 
support for tuples (in C# 7), it ignored the existing Tuple types in favor of the new 
ValueTuple. You might still come across the Tuple classes in code written prior to 
C# 7. They have no special language support and are used as follows: 


Tuple<string,int> t = Tuple.Create ("Bob", 23); // Factory method 
Console.WriteLine (t.Item1); // Bob 
Console.WriteLine (t.Item2); // 23 
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Patterns 


In Chapter 3, we demonstrated how to use the is operator to test whether a refer- 
ence conversion will succeed: 


if (obj is string) 
Console.WriteLine (((string)obj).Length) ; 


Or, more concisely: 


if (obj is string s) 
Console.WriteLine (s.Length); 


This employs one kind of pattern called a type pattern. The is operator also sup- 
ports other patterns that were introduced in C# 7 and C# 8, such as the property 
pattern: 


if (obj is string { Length:4 }) 
Console.WriteLine ("A string with 4 characters"); 


Patterns are supported in the following contexts: 


e After the is operator (variable is pattern) 
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e In switch statements 





e In switch expressions 


We've already covered the type pattern (and briefly, the tuple pattern) in “Switching 
on types” on page 75 in Chapter 2, and “The is operator” on page 110 in Chapter 3. 
In this section, we cover more advanced patterns that were introduced in C# 7 and 
C# 8. Most of these patterns are intended for use in switch statements/expressions, 
where they do the following: 


e Reduce the need for when clauses 


e Let you use switches where you couldn't previously 


The patterns in this section are mild-to-moderately useful in 
some scenarios. Remember that you can always replace highly 
patterned switch expressions with simple if statements—or 
in some cases, the ternary conditional operator—and often 
without much extra code. 


Property Patterns (C# 8) 


A property pattern matches on one or more of an object's property values. We gave 
a simple example previously in the context of the is operator: 


if (obj is string { Length:4 }) ... 
However, this doesn’t save much over the following: 


if (obj is string s && s.Length == 4) ... 
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With switch statements and expressions, property patterns are more useful. Con- 
sider the System.Uri class, which represents a URI. It has properties that include 
Scheme, Host, Port, and IsLoopback. In writing a firewall, we could decide whether 
to allow or block a URI by employing a switch expression that uses property 
patterns: 


bool ShouldAllow (Uri uri) => uri switch 


{ 
{ Scheme: "http", Port: 80 } => true, 
{ Scheme: "https", Port: 443 } => true, 
{ Scheme: "ftp", Port: 21 } => true, 
{ IsLoopback: true } => true, 
=> false 


33 
You can nest properties, making the following clause legal: 
{ Scheme: string { Length: 4 }, Port: 80 } => true, 


Matching is always based on type and equality. Should you need to apply some 
other operator (such as less-than), you must use a when clause: 


{ Scheme: "http", Port: 80 } when uri.Host.Length < 1000 => true, 
You can combine the type pattern with the property pattern: 


bool ShouldAllow (object uri) => uri switch 


{ 
Uri { Scheme: "http", Port: 80 } => true, 
Uri { Scheme: "https", Port: 443 } => true, 


As you might expect with type patterns, you can introduce a variable at the end of a 
clause and then consume that variable: 


Uri { Scheme: "http", Port: 80 } httpUri => httpUri.Host.Length < 1000, 
You can also use that variable in a when clause: 


Uri { Scheme: "http", Port: 80 } httpUri 
when httpUri.Host.Length < 1000 => true, 


A somewhat bizarre twist with property patterns is that you can also introduce vari- 
ables at the property level: 


{ Scheme: "http", Port: 80, Host: string host } => host.Length < 1000, 


Implicit typing is permitted, so you can substitute string with var. Here's a com- 
plete example: 


bool ShouldAllow (Uri uri) => uri switch 


{ 
{ Scheme: "http", Port: 80, Host: var host } => host.Length < 1000, 
{ Scheme: "https", Port: 443 } => true, 
{ Scheme: "ftp", Port: 21 } => true, 
{ IsLoopback: true } => true, 
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_ => false 
t3 
Its difficult to invent examples for which this saves more than a few characters. In 
our case, the alternative is actually shorter: 


{ Scheme: "http", Port: 80 } => uri.Host.Length < 1000, 


Tuple Patterns (C# 8) 
Tuple patterns provide a simple mechanism for switching on multiple values: 


enum Season { Spring, Summer, Fall, Winter }; 


int AverageCelsiusTemperature (Season season, bool daytime) => 
(season, daytime) switch 
{ 
(Season.Spring, true) => 20, 
(Season.Spring, false) => 16, 
(Season.Summer, true) => 27, 
(Season.Summer, false) => 22, 
(Season.Fall, true) => 18, 
(Season.Fall, false) => 12, 
(Season.Winter, true) => 10, 
(Season.Winter, false) => -2, 
_ => throw new Exception ("Unexpected combination") 


}5 
Positional Patterns (C# 8) 


For types that define a Deconstruct method (see “Deconstructors” on page 95 in 
Chapter 3), such as the Point class in the following example: 


class Point 


{ 
public readonly int X, Y; 
public Point (int x, int y) => (X, Y) = (%, y)3 
public void Deconstruct (out int x, out int y) 


you can use the object’s positional properties for pattern matching: 


var p = new Point (2, 3); 
Console.WriteLine (p is (2, 3)); // true 


With a switch: 


string Print (object obj) => obj switch 

{ 
Point (0, 0) => "Empty point", 
Point (var x, var y) when x == y => "Diagonal" 


3; 
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var Pattern 


The var pattern was introduced in C# 7 and is a variation of the type pattern 
whereby you replace the type name with the var keyword. The conversion always 
succeeds, so its purpose is merely to let you reuse the variable that follows: 


bool Test (int x, int y) => 
x * y is var product && product > 10 && product < 100; 


Without this feature, you'd need to do this: 


bool Test (int x, int y) 
{ 


int product = x * y; 
return product > 10 && product < 100; 
} 


The ability to introduce and reuse an intermediate variable (product, in this case) in 
an expression-bodied method is convenient. Unfortunately, it works only when the 
method in question has a bool return type. 


Constant Pattern 


The constant pattern is the bread and butter of switch statements (and until C# 7, it 
was the only supported pattern). For consistency, you also can use the constant pat- 
tern with the is operator from C# 7, making the following legal: 


void Foo (object obj) 

{ 
// C# won't let you use the == operator, because obj is object. 
// However, we can use '‘is' 
if (obj is 3)... 

} 


This is equivalent to the following: 


void Foo (object obj) 


{ 
if (obj is int && (int)obj == 3)... 
} 


Attributes 


You're already familiar with the notion of attributing code elements of a program 
with modifiers, such as virtual or ref. These constructs are built into the language. 
Attributes are an extensible mechanism for adding custom information to code ele- 
ments (assemblies, types, members, return values, parameters, and generic type 
parameters). This extensibility is useful for services that integrate deeply into the 
type system, without requiring special keywords or constructs in the C# language. 


A good scenario for attributes is serialization—the process of converting arbitrary 
objects to and from a particular format for storage or transmission. In this scenario, 
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an attribute on a field can specify the translation between C#’s representation of the 
field and the format’s representation of the field. 


Attribute Classes 


An attribute is defined by a class that inherits (directly or indirectly) from the 
abstract class System.Attribute. To attach an attribute to a code element, specify 
the attribute's type name in square brackets, before the code element. For example, 
the following attaches the ObsoleteAttribute to the Foo class: 


[ObsoleteAttribute] 
public class Foo {...} 


This particular attribute is recognized by the compiler and will cause compiler 
warnings if a type or member marked as obsolete is referenced. By convention, all 
attribute types end in the word Attribute. C# recognizes this and allows you to omit 
the suffix when attaching an attribute: 


[Obsolete] 
public class Foo {...} 


ObsoleteAttribute is a type declared in the System namespace as follows (simpli- 
fied for brevity): 


public sealed class ObsoleteAttribute : Attribute {...} 


The C# language and .NET Core include a number of predefined attributes. We 
describe how to write your own attributes in Chapter 19. 


Named and Positional Attribute Parameters 


Attributes can have parameters. In the following example, we apply XmlType 
Attribute to a class. This attribute instructs the XML serializer (in 
System.Xml. Serialization) as to how an object is represented in XML and accepts 
several attribute parameters. The following attribute maps the CustomerEntity class 
to an XML element named Customer, which belongs to the http://oreilly.com 
namespace: 


[XmlType ("Customer", Namespace="http://oreilly.com") ] 
public class CustomerEntity { ... } 


Attribute parameters fall into one of two categories: positional or named. In the pre- 
ceding example, the first argument is a positional parameter; the second is a named 
parameter. Positional parameters correspond to parameters of the attribute type’s 
public constructors. Named parameters correspond to public fields or public prop- 
erties on the attribute type. 


When specifying an attribute, you must include positional parameters that corre- 
spond to one of the attribute’s constructors. Named parameters are optional. 


In Chapter 19, we describe the valid parameter types and rules for their evaluation. 
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Applying Attributes to Assemblies and Backing Fields 


Implicitly, the target of an attribute is the code element it immediately precedes, 
which is typically a type or type member. You can also attach attributes, however, to 
an assembly. This requires that you explicitly specify the attribute's target. Here is 
how you can use the AssemblyFileVersion attribute to attach a version to the 
assembly: 


[assembly: AssemblyFileVersion ("1.2.3.4")] 


From C# 7.3, you can use the field: prefix to apply an attribute to the backing 
fields of an automatic property. This can be useful in controlling serialization: 


[field:NonSerialized] 
public int MyProperty { get; set; } 
Specifying Multiple Attributes 


You can specify multiple attributes for a single code element. You can list each 
attribute either within the same pair of square brackets (separated by a comma) or 
in separate pairs of square brackets (or a combination of the two). The following 
three examples are semantically identical: 


[Serializable, Obsolete, CLSCompliant(false) ] 
public class Bar {...} 


[Serializable] [Obsolete] [CLSCompliant(false) ] 
public class Bar {...} 


[Serializable, Obsolete] 
[CLSCompliant(false) ] 
public class Bar {...} 


Caller Info Attributes 


You can tag optional parameters with one of three caller info attributes, which 
instruct the compiler to feed information obtained from the caller’s source code into 
the parameter’s default value: 

e [CallerMemberName] applies the caller’s member name 

e [CallerFilePath] applies the path to the caller’s source code file 


e [CallerLineNumber ] applies the line number in the caller’s source code file 


The Foo method in the following program demonstrates all three: 


using System; 
using System.Runtime.CompilerServices; 


class Program 


{ 


static void Main() => Foo(); 
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static void Foo ( 
[CallerMemberName] string memberName = null, 
[CallerFilePath] string filePath = null, 
[CallerLineNumber] int lineNumber = 0) 


{ 


Console.WriteLine (memberName) ; 
Console.WriteLine (filePath); 
Console.WriteLine (LineNumber ); 


i 
I 
Assuming that our program resides in c:\source\test\Program.cs, the output would 
be: 


Main 
c:\source\test\Program.cs 
6 


As with standard optional parameters, the substitution is done at the calling site. 
Hence, our Main method is syntactic sugar for this: 


static void Main() => Foo ("Main", @"c:\source\test\Program.cs", 6); 


Caller info attributes are useful for logging—and for implementing patterns such as 
firing a single change notification event whenever any property on an object 
changes. In fact, there's a standard interface in .NET Core for this called INotify 
PropertyChanged (in System. ComponentModel): 


public interface INotifyPropertyChanged 


{ 
event PropertyChangedEventHandler PropertyChanged; 


} 


public delegate void PropertyChangedEventHandler 
(object sender, PropertyChangedEventArgs e); 


public class PropertyChangedEventArgs : EventArgs 


{ 
public PropertyChangedEventArgs (string propertyName) ; 


public virtual string PropertyName { get; } 


} 


Notice that PropertyChangedEventArgs requires the name of the property that 
changed. By applying the [CallerMemberName] attribute, however, we can imple- 
ment this interface and invoke the event without ever specifying property names: 


public class Foo : INotifyPropertyChanged 


{ 
public event PropertyChangedEventHandler PropertyChanged = delegate { }; 


void RaisePropertyChanged ([CallerMemberName] string propertyName = null) 


{ 
PropertyChanged (this, new PropertyChangedEventArgs (propertyName) ); 


} 


string customerName; 
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public string CustomerName 


{ 
get { return customerName; } 
set 


{ 
if (value == customerName) return; 
customerName = value; 
RaisePropertyChanged(); 
// The compiler converts the above line to: 
// RaisePropertyChanged ("CustomerName") ; 

} 

} 
} 


Dynamic Binding 


Dynamic binding defers binding—the process of resolving types, members, and 
operators—from compile time to runtime. Dynamic binding is useful when at com- 
pile time you know that a certain function, member, or operation exists, but the 
compiler does not. This commonly occurs when you are interoperating with 
dynamic languages (such as IronPython) and COM, as well as for scenarios in 
which you might otherwise use reflection. 


A dynamic type is declared with the contextual keyword dynamic: 


dynamic d = GetSomeObject(); 
d.Quack(); 


A dynamic type tells the compiler to relax. We expect the runtime type of d to have a 
Quack method. We just can’t prove it statically. Because d is dynamic, the compiler 
defers binding Quack to d until runtime. To understand what this means requires 
distinguishing between static binding and dynamic binding. 


Static Binding versus Dynamic Binding 


The canonical binding example is mapping a name to a specific function when 
compiling an expression. To compile the following expression, the compiler needs 
to find the implementation of the method named Quack: 


d.Quack(); 
Let’s suppose that the static type of d is Duck: 


Duck d=... 
d.Quack(); 


In the simplest case, the compiler does the binding by looking for a parameterless 
method named Quack on Duck. Failing that, the compiler extends its search to meth- 
ods taking optional parameters, methods on base classes of Duck, and extension 
methods that take Duck as its first parameter. If no match is found, you'll get a com- 
pilation error. Regardless of what method is bound, the bottom line is that the 
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binding is done by the compiler, and the binding utterly depends on statically 
knowing the types of the operands (in this case, d). This makes it static binding. 


Now let’s change the static type of d to object: 


object d=... 
d.Quack(); 


Calling Quack gives us a compilation error, because although the value stored in d 
can contain a method called Quack, the compiler cannot know it, because the only 
information it has is the type of the variable, which in this case is object. But let’s 
now change the static type of d to dynamic: 


dynamic d=... 
d.Quack(); 


A dynamic type is like object—it’s equally nondescriptive about a type. The differ- 
ence is that it lets you use it in ways that aren't known at compile time. A dynamic 
object binds at runtime based on its runtime type, not its compile-time type. When 
the compiler sees a dynamically bound expression (which in general is an expres- 
sion that contains any value of type dynamic), it merely packages up the expression 
such that the binding can be done later at runtime. 


At runtime, if a dynamic object implements IDynamicMetaObjectProvider, that 
interface is used to perform the binding. If not, binding occurs in almost the same 
way as it would have had the compiler known the dynamic object’s runtime type. 
These two alternatives are called custom binding and language binding. 


Custom Binding 


Custom binding occurs when a dynamic object implements IDynamic 
MetaObjectProvider (IDMOP). Although you can implement IDMOP on types 
that you write in C#, and that is useful to do, the more common case is that you 
have acquired an IDMOP object from a dynamic language that is implemented 
in .NET on the Dynamic Language Runtime (DLR), such as IronPython or Iron- 
Ruby. Objects from those languages implicitly implement IDMOP as a means by 
which to directly control the meanings of operations performed on them. 


We discuss custom binders in greater detail in Chapter 20, but for now, let’s write a 
simple one to demonstrate the feature: 


using System; 
using System.Dynamic; 


public class Test 


{ 
static void Main() 
{ 
dynamic d = new Duck(); 
d.Quack(); // Quack method was called 
d.Waddle(); // Waddle method was called 
} 
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} 


public class Duck : DynamicObject 
{ 


public override bool TryInvokeMember ( 
InvokeMemberBinder binder, object[] args, out object result) 


{ 
Console.WriteLine (binder.Name + " method was called"); 
result = null; 
return true; 
} 
} 


The Duck class doesn’t actually have a Quack method. Instead, it uses custom bind- 
ing to intercept and interpret all method calls. 


Language Binding 


Language binding occurs when a dynamic object does not implement IDMOP. Lan- 
guage binding is useful when working around imperfectly designed types or inher- 
ent limitations in the .NET type system (we explore more scenarios in Chapter 20). 
A typical problem when using numeric types is that they have no common inter- 
face. We have seen that we can bind methods dynamically; the same is true for 
operators: 


static dynamic Mean (dynamic x, dynamic y) => (x + y) / 23 


static void Main() 


{ 
it x= 3, y= 4; 
Console.WriteLine (Mean (x, y)); 


} 


The benefit is obvious—you don't need to duplicate code for each numeric type. 
However, you lose static type safety, risking runtime exceptions rather than 
compile-time errors. 


Dynamic binding circumvents static type safety, but not run- 
time type safety. Unlike with reflection (Chapter 19), you can't 
circumvent member accessibility rules with dynamic binding. 


By design, language runtime binding behaves as similarly as possible to static bind- 
ing, had the runtime types of the dynamic objects been known at compile time. In 
our previous example, the behavior of our program would be identical if we hardco- 
ded Mean to work with the int type. The most notable exception in parity between 
static and dynamic binding is for extension methods, which we discuss in “Uncalla- 
ble Functions” on page 215. 
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Dynamic binding also incurs a performance hit. Because of 
the DLR’s caching mechanisms, however, repeated calls to the 
same dynamic expression are optimized—allowing you to effi- 
ciently call dynamic expressions in a loop. This optimization 
brings the typical overhead for a simple dynamic expression 
on today’s hardware down to less than 100 nanoseconds. 


RuntimeBinderException 


If a member fails to bind, a RuntimeBinderException is thrown. You can think of 
this like a compile-time error at runtime: 


dynamic d = 5; 
d.Hello(); // throws RuntimeBinderException 


The exception is thrown because the int type has no Hello method. 


Runtime Representation of dynamic 


There is a deep equivalence between the dynamic and object types. The runtime 
treats the following expression as true: 


typeof (dynamic) == typeof (object) 
This principle extends to constructed types and array types: 


typeof (List<dynamic>) == typeof (List<object>) 

typeof (dynamic[]) == typeof (object[]) 
Like an object reference, a dynamic reference can point to an object of any type 
(except pointer types): 


dynamic x = "hello"; 
Console.WriteLine (x.GetType().Name); // String 


x = 123; // No error (despite same variable) 

Console.WriteLine (x.GetType().Name); // Int32 
Structurally, there is no difference between an object reference and a dynamic refer- 
ence. A dynamic reference simply enables dynamic operations on the object it 
points to. You can convert from object to dynamic to perform any dynamic opera- 
tion you want on an object: 


object o = new System.Text.StringBuilder(); 
dynamic d = 0; 

d.Append ("hello"); 

Console.WriteLine (0); // hello 
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Reflecting on a type exposing (public) dynamic members 
reveals that those members are represented as annotated 
objects; for example: 


public class Test 


{ 


public dynamic Foo; 


} 


is equivalent to: 


public class Test 


{ 
[System.Runtime.CompilerServices .DynamicAttribute] 
public object Foo; 


} 


This allows consumers of that type to know that Foo should be 
treated as dynamic while allowing languages that don't sup- 
port dynamic binding to fall back to object. 


Dynamic Conversions 

The dynamic type has implicit conversions to and from all other types: 
nth = 73 
dynamic d = 


long j = d; // No cast required (implicit conversion) 


For the conversion to succeed, the runtime type of the dynamic object must be 
implicitly convertible to the target static type. The preceding example worked 
because an int is implicitly convertible to a Long. 


The following example throws a RuntimeBinderException because an int is not 
implicitly convertible to a short: 
int i 


= 73 
dynamic d 


short j // throws RuntimeBinderException 


var Versus dynamic 


The var and dynamic types bear a superficial resemblance, but the difference is 
deep: 


* var says, “Let the compiler figure out the type.” 


e dynamic says, “Let the runtime figure out the type.” 


To illustrate: 
dynamic x = "hello"; // Static type is dynamic, runtime type is string 
var y = "hello"; // Static type is string, runtime type is string 
int i =x; // Runtime error (cannot convert string to int) 
int j=y; // Compile-time error (cannot convert string to int) 


The static type of a variable declared with var can be dynamic: 
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dynamic x = "hello"; 


var y = X; // Static type of y is dynamic 
int z=y; // Runtime error (cannot convert string to int) 
Dynamic Expressions 


Fields, properties, methods, events, constructors, indexers, operators, and conver- 
sions can all be called dynamically. 


Trying to consume the result of a dynamic expression with a void return type is 
prohibited—just as with a statically typed expression. The difference is that the 
error occurs at runtime: 


dynamic list = new List<int>(); 
var result = list.Add (5); // RuntimeBinderException thrown 


Expressions involving dynamic operands are typically themselves dynamic because 
the effect of absent type information is cascading: 


dynamic x = 2; 
var y = x * 3; // Static type of y is dynamic 


There are a couple of obvious exceptions to this rule. First, casting a dynamic 
expression to a static type yields a static expression: 


dynamic x = 2; 
var y = (int)x; // Static type of y is int 


Second, constructor invocations always yield static expressions—even when called 
with dynamic arguments. In this example, x is statically typed to a StringBuilder: 


dynamic capacity = 10; 
var X = new System.Text.StringBuilder (capacity); 


In addition, there are a few edge cases for which an expression containing a 
dynamic argument is static, including passing an index to an array and delegate cre- 
ation expressions. 


Dynamic Calls Without Dynamic Receivers 


The canonical use case for dynamic involves a dynamic receiver. This means that a 
dynamic object is the receiver of a dynamic function call: 


dynamic x = ...;3 
x. Foo(); // x is the receiver 


However, you can also call statically known functions with dynamic arguments. 
Such calls are subject to dynamic overload resolution, and can include the 
following: 

¢ Static methods 

¢ Instance constructors 


e Instance methods on receivers with a statically known type 
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In the following example, the particular Foo that gets dynamically bound is depen- 
dent on the runtime type of the dynamic argument: 


class Program 


{ 


static void Foo (int x) => Console.WriteLine ("int"); 
static void Foo (string x) => Console.WriteLine ("string"); 


static void Main() 


{ 
dynamic x = 5; 
dynamic y = "watermelon"; 
Foo (x); // 1 
Foo (y); // 2 
} 


} 


Because a dynamic receiver is not involved, the compiler can statically perform a 
basic check to see whether the dynamic call will succeed. It checks whether a func- 
tion with the correct name and number of parameters exists. If no candidate is 
found, you get a compile-time error: 


class Program 


{ 


static void Foo (int x) => Console.WriteLine ("int"); 
static void Foo (string x) => Console.WriteLine ("string"); 


static void Main() 


{ 
dynamic x = 5; 
Foo (x, x)3 // Compiler error - wrong number of parameters 
Fook (x); // Compiler error - no such method name 

} 


} 


Static Types in Dynamic Expressions 


It's obvious that dynamic types are used in dynamic binding. It’s not so obvious that 
static types are also used—wherever possible—in dynamic binding. Consider the 
following: 


class Program 
{ 
static void Foo (object x, object y) { Console.WriteLine ("oo"); } 
static void Foo (object x, string y) { Console.WriteLine ("os"); } 
static void Foo (string x, object y) { Console.WriteLine ("so"); } 
static void Foo (string x, string y) { Console.WriteLine ("ss"); } 
static void Main() 
{ 
object o = "hello"; 
dynamic d = "goodbye"; 
Foo (o, d); // os 
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J 
} 


The call to Foo(o,d) is dynamically bound because one of its arguments, d, is 
dynamic. But because o is statically known, the binding—even though it occurs 
dynamically—will make use of that. In this case, overload resolution will pick the 
second implementation of Foo due to the static type of o and the runtime type of d. 
In other words, the compiler is “as static as it can possibly be” 


Uncallable Functions 


Some functions cannot be called dynamically. You cannot call the following: 


e Extension methods (via extension method syntax) 
¢ Members of an interface, if you need to cast to that interface to do so 


e Base members hidden by a subclass 


Understanding why this is so is useful in understanding dynamic binding. 


Dynamic binding requires two pieces of information: the name of the function to 
call, and the object upon which to call the function. However, in each of the three 
uncallable scenarios, an additional type is involved, which is known only at compile 
time. As of this writing, there’s no way to specify these additional types dynamically. 
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When calling extension methods, that additional type is implicit. It’s the static class 
on which the extension method is defined. The compiler searches for it given the 
using directives in your source code. This makes extension methods compile-time- 
only concepts because using directives melt away upon compilation (after they've 
done their job in the binding process in mapping simple names to namespace- 
qualified names). 


When calling members via an interface, you specify that additional type via an 
implicit or explicit cast. There are two scenarios for which you might want to do 
this: when calling explicitly implemented interface members, and when calling 
interface members implemented in a type internal to another assembly. We can 
illustrate the former with the following two types: 


interface IFoo { void Test(); } 
class Foo : IFoo { void IFoo.Test() {} } 


To call the Test method, we must cast to the IFoo interface. This is easy with static 
typing: 

IFoo f = new Foo(); // Implicit cast to interface 

f.Test(); 
Now consider the situation with dynamic typing: 


IFoo f = new Foo(); 
dynamic d = f; 
d.Test(); // Exception thrown 
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The implicit cast shown in bold tells the compiler to bind subsequent member calls 
on f to IFoo rather than Foo—in other words, to view that object through the lens 
of the IFoo interface. However, that lens is lost at runtime, so the DLR cannot com- 
plete the binding. The loss is illustrated as follows: 


Console.WriteLine (f.GetType().Name) ; // Foo 


A similar situation arises when calling a hidden base member: you must specify an 
additional type via either a cast or the base keyword—and that additional type is 
lost at runtime. 


Operator Overloading 


You can overload operators to provide more natural syntax for custom types. Oper- 
ator overloading is most appropriately used for implementing custom structs that 
represent fairly primitive data types. For example, a custom numeric type is an 
excellent candidate for operator overloading. 


The following symbolic operators can be overloaded: 


+(unary) - (unary) ! ~ ++ 
fe + = # 
% & | A << 
>> == ls > < 


S= <= 


The following operators are also overloadable: 


e Implicit and explicit conversions (with the implicit and explicit keywords) 


e The true and false operators (not literals). 
The following operators are indirectly overloaded: 


e The compound assignment operators (e.g., +=, /=) are implicitly overridden by 
overriding the noncompound operators (e.g., +, /). 


¢ The conditional operators && and || are implicitly overridden by overriding 
the bitwise operators & and |. 


Operator Functions 


You overload an operator by declaring an operator function. An operator function 
has the following rules: 
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¢ The name of the function is specified with the operator keyword followed by 
an operator symbol. 


¢ The operator function must be marked static and public. 
¢ The parameters of the operator function represent the operands. 
e The return type of an operator function represents the result of an expression. 


e At least one of the operands must be the type in which the operator function is 
declared. 


In the following example, we define a struct called Note representing a musical note 
and then overload the + operator: 


public struct Note 

{ 
int value; 
public Note (int semitonesFromA) { value = semitonesFromA; } 
public static Note operator + (Note x, int semitones) 


> 

{ A 
return new Note (x.value + semitones); a $ 

} #5 
8 

} 2 





This overload allows us to add an int to a Note: 


Note B = new Note (2); 
Note CSharp = B + 2; 


Overloading an operator automatically overloads the corresponding compound 

assignment operator. In our example, because we overrode +, we can use +=, too: 
CSharp += 2; 

Just as with methods and properties, C# allows operator functions comprising a sin- 


gle expression to be written more tersely with expression-bodied syntax: 


public static Note operator + (Note x, int semitones) 
=> new Note (x.value + semitones); 


Overloading Equality and Comparison Operators 


Equality and comparison operators are sometimes overridden when writing structs, 
and in rare cases when writing classes. Special rules and obligations come with over- 
loading the equality and comparison operators, which we explain in Chapter 6. A 
summary of these rules is as follows: 


Pairing 
The C# compiler enforces operators that are logical pairs to both be defined. 
These operators are (== !=), (< >), and (<= >=). 


Equals and GetHashCode 
In most cases, if you overload (==) and (!=), you must override the Equals and 
GetHashCode methods defined on object in order to get meaningful behavior. 





Operator Overloading | 217 


The C# compiler will give a warning if you do not do this. (See “Equality Com- 
parison” on page 296 in Chapter 6 for more details.) 


IComparable and IComparable<T> 
If you overload (< >) and (<= >=), you should implement IComparable and 
IComparable<T>. 


Custom Implicit and Explicit Conversions 


Implicit and explicit conversions are overloadable operators. These conversions are 
typically overloaded to make converting between strongly related types (such as 
numeric types) concise and natural. 


To convert between weakly related types, the following strategies are more suitable: 


e Write a constructor that has a parameter of the type to convert from. 


e Write ToXXX and (static) FromxXxX methods to convert between types. 


As explained in the discussion on types, the rationale behind implicit conversions is 
that they are guaranteed to succeed and not lose information during the conversion. 
Conversely, an explicit conversion should be required either when runtime circum- 
stances will determine whether the conversion will succeed, or if information might 
be lost during the conversion. 


Custom conversions are ignored by the as and is operators: 


Console.WriteLine (554.37 is Note); // False 
Note n = 554.37 as Note; // Error 


In this example, we define conversions between our musical Note type and a double 
(which represents the frequency in hertz of that note): 


// Convert to hertz 
public static implicit operator double (Note x) 
=> 440 * Math.Pow (2, (double) x.value / 12 ); 


// Convert from hertz (accurate to the nearest semitone) 
public static explicit operator Note (double x) 
=> new Note ((int) (0.5 + 12 * (Math.Log (x/440) / Math.Log(2) ) )); 


Note n 


= (Note)554.37; // explicit conversion 
double x = 


n; // implicit conversion 


Following our own guidelines, this example might be better 
implemented with a ToFrequency method (and a static 
FromFrequency method) instead of implicit and explicit 
operators. 
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Overloading true and false 


The true and false operators are overloaded in the extremely rare case of types 
that are Boolean in spirit, but do not have a conversion to bool. An example is a 
type that implements three-state logic: by overloading true and false, such a type 
can work seamlessly with conditional statements and operators—namely, if, do, 
while, for, &&, ||, and ?:. The System.Data. SqlTypes.SqlBoolean struct provides 
this functionality: 


SqlBoolean a = SqlBoolean.Null; 


if (a) 

Console.WriteLine ("True"); 
else if (!a) 

Console.WriteLine ("False"); 
else 


Console.WriteLine ("Null"); 


OUTPUT: 
Null 


The following code is a reimplementation of the parts of SqlBoolean necessary to 
demonstrate the true and false operators: 
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public struct SqlBoolean 
{ 


public static bool operator true (SqlBoolean x) 
=> x.m_value == True.m_value; 


public static bool operator false (SqlBoolean x) 
=> x.m_value == False.m_value; 


public static SqlBoolean operator ! (SqlBoolean x) 


{ 
if (x.m_value == Null.m_value) return Null; 
if (x.m_value == False.m_value) return True; 
return False; 

} 


public static readonly SqlBoolean Null = new SqlBoolean(0); 
public static readonly SqlBoolean False = new SqlBoolean(1); 
public static readonly SqlBoolean True = new SqlBoolean(2); 


private SqlBoolean (byte value) { m_value = value; } 
private byte m_value; 


} 


Unsafe Code and Pointers 


C# supports direct memory manipulation via pointers within blocks of code 
marked unsafe and compiled with the /unsafe compiler option. Pointer types are 
primarily useful for interoperability with C APIs, but you also can use them for 
accessing memory outside the managed heap or for performance-critical hotspots. 
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Pointer Basics 


For every value type or reference type V, there is a corresponding pointer type V*. A 
pointer instance holds the address of a variable. Pointer types can be (unsafely) cast 
to any other pointer type. Following are the main pointer operators: 


Operator Meaning 





& The address-of operator returns a pointer to the address of a variable 

* The dereference operator returns the variable at the address of a pointer 

-> The pointer-to-member operator is a syntactic shortcut, in which x ->y is equivalent to (*x).y 
Unsafe Code 


By marking a type, type member, or statement block with the unsafe keyword, 
youre permitted to use pointer types and perform C++ style pointer operations on 
memory within that scope. Here is an example of using pointers to quickly process a 
bitmap: 


unsafe void BlueFilter (int[,] bitmap) 


{ 
int length = bitmap.Length; 
fixed (int* b = bitmap) 


{ 
int* p = b; 
for (int i = 0; i < length; i++) 
*pt+ &= OXxFF; 
} 
} 


Unsafe code can run faster than a corresponding safe implementation. In this case, 
the code would have required a nested loop with array indexing and bounds check- 
ing. An unsafe C# method can also be faster than calling an external C function 
given that there is no overhead associated with leaving the managed execution 
environment. 


The fixed Statement 


The fixed statement is required to pin a managed object, such as the bitmap in the 
previous example. During the execution of a program, many objects are allocated 
and deallocated from the heap. To avoid unnecessary waste or fragmentation of 
memory, the garbage collector moves objects around. Pointing to an object is futile 
if its address could change while referencing it, so the fixed statement tells the 
garbage collector to “pin” the object and not move it around. This can have an 
impact on the efficiency of the runtime, so you should use fixed blocks only briefly, 
and you should avoid heap allocation within the fixed block. 


Within a fixed statement, you can get a pointer to any value type, an array of value 
types, or a string. In the case of arrays and strings, the pointer will actually point to 
the first element, which is a value type. 
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Value types declared inline within reference types require the reference type to be 
pinned, as follows: 


class Test 


{ 
int x; 
static void Main() 


{ 
Test test = new Test(); 


unsafe 


{ 
fixed (int* p = &test.x) // Pins test 


{ 


System.Console.WriteLine (test.x); 


} 
} 
} 


We describe the fixed statement further in “Mapping a Struct to Unmanaged Mem- 
ory” on page 984 in Chapter 25. 


The Pointer-to-Member Operator 


In addition to the & and * operators, C# also provides the C++ style -> operator, 
which you can use on structs: 


struct Test 


{ 
int x; 
unsafe static void Main() 


{ 
Test test = new Test(); 


Test* p = &test; 
p->x = 9; 
System.Console.WriteLine (test.x); 


} 
} 


The stackalloc Keyword 


You can allocate memory in a block on the stack explicitly by using the stackalloc 
keyword. Because it is allocated on the stack, its lifetime is limited to the execution 
of the method, just as with any other local variable (whose life hasn't been extended 
by virtue of being captured by a lambda expression, iterator block, or asynchronous 
function). The block can use the [] operator to index into memory: 


int* a = stackalloc int [10]; 
for (int i = 0; i < 10; ++i) 
Console.WriteLine (a[i]); // Print raw memory 
In Chapter 24, we describe how you can use Span<T> to manage stack-allocated 
memory without using the unsafe keyword: 
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Span<int> a = stackalloc int [10]; 
for (int i = 0; i < 10; ++i) 
Console.WriteLine (a[i]); 


Fixed-Size Buffers 


The fixed keyword has another use, which is to create fixed-size buffers within 
structs (this can be useful when calling an unmanaged function; see Chapter 24): 


unsafe struct UnsafeUnicodeString 


public short Length; 
public fixed byte Buffer[30]; // Allocate block of 30 bytes 


} 


unsafe class UnsafeClass 


{ 


UnsafeUnicodeString uus; 


public UnsafeClass (string s) 
{ 
uus.Length = (short)s.Length; 
fixed (byte* p = uus.Buffer) 
for (int i = 0; i < s.Length; i++) 
pli] = (byte) s[i]; 
} 
} 


class Test 


{ 


static void Main() { new UnsafeClass ("Christian Troy"); } 


} 


Fixed-size buffers are not arrays: if Buffer was an array, it would consist of a refer- 
ence to an object stored on the (managed) heap, rather than 30 bytes within the 
struct itself. 


The fixed keyword is also used in this example to pin the object on the heap that 
contains the buffer (which will be the instance of UnsafeClass). Hence, fixed 
means two different things: fixed in size, and fixed in place. The two are often used 
together, in that a fixed-size buffer must be fixed in place to be used. 


void* 

A void pointer (void*) makes no assumptions about the type of the underlying data 
and is useful for functions that deal with raw memory. An implicit conversion exists 
from any pointer type to void*. A void* cannot be dereferenced, and arithmetic 
operations cannot be performed on void pointers. Here's an example: 


class Test 


{ 


unsafe static void Main() 


{ 
short[ ] a = {1,1,2,3,5,8,13,21,34,55}; 
fixed (short* p = a) 
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{ 


//sizeof returns size of value-type in bytes 
Zap (p, a.Length * sizeof (short)); 
} 
foreach (short x in a) 
System.Console.WriteLine (x);  // Prints all zeros 


} 


unsafe static void Zap (void* memory, int byteCount) 
{ 
byte* b = (byte*) memory; 
for (int i = 0; i < byteCount; i++) 
*b++ = 0; 
} 
} 


Pointers to Unmanaged Code 


Pointers are also useful for accessing data outside the managed heap (such as when 
interacting with C Dynamic-Link Libraries [DLLs] or Component Object Model 
[COM]) or when dealing with data not in the main memory (such as graphics 
memory or a storage medium on an embedded device). 


Preprocessor Directives 


Preprocessor directives supply the compiler with additional information about 
regions of code. The most common preprocessor directives are the conditional 
directives, which provide a way to include or exclude regions of code from 
compilation: 


#define DEBUG 
class MyClass 
{ 
int x; 
void Foo() 
{ 
#if DEBUG 
Console.WriteLine ("Testing: x = {0}", x); 
#endif 
} 


Ps 


<PropertyGroup> 
<DefineConstants>DEBUG; ANOTHERSYMBOL</DefineConstants> 
</PropertyGroup> 


> 
= 
Qo 
HS 
Qa 
o 
ror 





In this class, the statement in Foo is compiled as conditionally dependent upon the 
presence of the DEBUG symbol. If we remove the DEBUG symbol, the statement is not 
compiled. You can define preprocessor symbols within a source file (as we have 
done), or at a project level in the .csproj file: 
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With the #if and #elif directives, you can use the ||, &&, and ! operators to per- 
form or, and, and not operations on multiple symbols. The following directive 
instructs the compiler to include the code that follows if the TESTMODE symbol is 
defined and the DEBUG symbol is not defined: 


#if TESTMODE && !DEBUG 


Keep in mind, however, that youre not building an ordinary C# expression, and the 
symbols upon which you operate have absolutely no connection to variables—static 
or otherwise. 


The #error and #warning symbols prevent accidental misuse of conditional direc- 
tives by making the compiler generate a warning or error given an undesirable set of 
compilation symbols. Table 4-1 lists the preprocessor directives. 


Table 4-1. Preprocessor directives 


Preprocessor directive Action 

#define symbol Defines symbol 

#undef symbol Undefines symbol 

#if symbol [operator symbol to test 

symbol2]... 
operators are ==, !=, &&, and | | followed by #e1se, 
#elif, and #endif 

#else Executes code to subsequent #endif 

#elif symbol [operator Combines #e1se branch and #7f test 

symbol2] 

#endif Ends conditional directives 

#warning text text of the warning to appear in compiler output 

#error text text of the error to appear in compiler output 

#pragma warning Disables/restores compiler warning(s) 

[disable | restore] 

#line [ number number specifies the line in source code; fi Le is the filename 

["file"] | hidden] to appear in computer output; hidden instructs debuggers to 
skip over code from this point until the next #1 ine directive 

#region name Marks the beginning of an outline 

#endregion Ends an outline region 

#nullable option See “Nullable Reference Types (C# 8)” on page 191 
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Conditional Attributes 


An attribute decorated with the Conditional attribute will be compiled only if a 
given preprocessor symbol is present: 


// filet.cs 

#define DEBUG 

using System; 

using System.Diagnostics; 
[Conditional("DEBUG") ] 

public class TestAttribute : Attribute {} 


// file2.cs 
#define DEBUG 
[Test] 

class Foo 


{ 
[Test] 
string s; 
} 
The compiler will incorporate the [Test] attributes only if the DEBUG symbol is in 
scope for file2.cs. 


pragma warning 


The compiler generates a warning when it spots something in your code that seems 
unintentional. Unlike errors, warnings don't ordinarily prevent your application 
from compiling. 


Compiler warnings can be extremely valuable in spotting bugs. Their usefulness, 
however, is undermined when you get false warnings. In a large application, main- 
taining a good signal-to-noise ratio is essential if the real warnings are to be noticed. 


To this effect, the compiler allows you to selectively suppress warnings by using the 
#pragma warning directive. In this example, we instruct the compiler not to warn us 
about the field Message not being used: 


public class Foo 


{ 
static void Main() { } 


#pragma warning disable 414 
static string Message = "Hello"; 
#pragma warning restore 414 


} 


Omitting the number in the #pragma warning directive disables or restores all 
warning codes. 


If you are thorough in applying this directive, you can compile with 
the /warnaserror switch—this instructs the compiler to treat any residual warnings 
as errors. 
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XML Documentation 


A documentation comment is a piece of embedded XML that documents a type or 
member. A documentation comment comes immediately before a type or member 
declaration and starts with three slashes: 


/// <summary>Cancels a running query.</summary> 
public void Cancel() { ... } 


You can do multiline comments either like this: 


/// <summary> 
/// Cancels a running query 
/// </summary> 
public void Cancel() { ... } 


or like this (notice the extra star at the start): 


[** 

<summary> Cancels a running query. </summary> 
*] 
public void Cancel() { ... } 


If you add the following option to your .csproj file: 


<PropertyGroup> 
<DocumentationFile>SomeFi le. xml</DocumentationFile> 
</PropertyGroup> 
the compiler extracts and collates documentation comments into the specified XML 
file. This has two main uses: 


¢ If placed in the same folder as the compiled assembly, tools such as Visual Stu- 
dio and LINQPad automatically read the XML file and use the information to 
provide Intellisense member listings to consumers of the assembly of the same 
name. 


e Third-party tools (such as Sandcastle and NDoc) can transform the XML file 
into an HTML help file. 


Standard XML Documentation Tags 
Here are the standard XML tags that Visual Studio and documentation generators 


recognize: 


<summary> 
<summary>...</Summary> 


Indicates the tool tip that IntelliSense should display for the type or member; 
typically a single phrase or sentence. 





226 | Chapter 4: Advanced C# 


<remarks> 
<remarks>...</remarks> 


Additional text that describes the type or member. Documentation generators 
pick this up and merge it into the bulk of a type or member's description. 
<param> 
<param name="name">...</param> 
Explains a parameter on a method. 
<returns> 
<returns>...</returns> 
Explains the return value for a method. 
<exception> 
<exception [cref="type"]>...</exception> 
Lists an exception that a method can throw (cref refers to the exception type). 
<permission> 
<permission [cref="type" |>...</permission> 
Indicates an IPermission type required by the documented type or member. 
<example> 
<exampLle>...</exampLle> 


Denotes an example (used by documentation generators). This usually contains 
both description text and source code (source code is typically within a <c> or 
<code> tag). 


<c> 
<c>...</c> 


Indicates an inline code snippet. This tag is usually used within an <example> 
block. 


<code> 
<code>. ..</code> 


Indicates a multiline code sample. This tag is usually used within an <example> 
block. 


<see> 
<see cref="member">...</see> 


Inserts an inline cross-reference to another type or member. HTML documen- 
tation generators typically convert this to a hyperlink. The compiler emits a 
warning if the type or member name is invalid. To refer to generic types, use 
curly braces; for example, cref="Foo{T ,U}". 
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<seealso> 
<seealso cref="member">...</seealso> 


Cross-references another type or member. Documentation generators typically 
write this into a separate “See Also” section at the bottom of the page. 


<paramref> 
<paramref name="name" /> 


References a parameter from within a <summary> or <remarks> tag. 


<list> 
<list type=[ bullet | number | table ]> 
<listheader> 
<term>...</term> 
<description>...</description> 
</listheader> 
<item> 
<term>...</term> 
<description>...</description> 
</item> 
</list> 
Instructs documentation generators to emit a bulleted, numbered, or table- 
style list. 


<para> 
<para>...</para> 


Instructs documentation generators to format the contents into a separate 
paragraph. 


<include> 
<include file='filename' path='tagpath[@name="id"]'>...</include> 


Merges an external XML file that contains documentation. The path attribute 
denotes an XPath query to a specific element in that file. 


User-Defined Tags 


Little is special about the predefined XML tags recognized by the C# compiler, and 
you are free to define your own. The only special processing done by the compiler is 
on the <param> tag (in which it verifies the parameter name and that all the parame- 
ters on the method are documented) and the cref attribute (in which it verifies that 
the attribute refers to a real type or member and expands it to a fully qualified type 
or member ID). You can also use the cref attribute in your own tags; it is verified 
and expanded just as it is in the predefined <exception>, <permission>, <see>, and 
<seealso> tags. 
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Type or Member Cross-References 


Type names and type or member cross-references are translated into IDs that 
uniquely define the type or member. These names are composed of a prefix that 
defines what the ID represents and a signature of the type or member. Following are 
the member prefixes: 


XML type prefix ID prefixes applied to... 
N 


mz Vv TT 


Namespace 

Type (class, struct, enum, interface, delegate) 
Field 

Property (includes indexers) 

Method (includes special methods) 

Event 


Error 





The rules describing how the signatures are generated are well documented, 


although fairly complex. 


Here is an example of a type and the IDs that are generated: 


// Namespaces do not have independent signatures 
Namespace NS 


/// T:NS.MyClass 
class MyClass 


{ 
/// F:NS.MyClass.aField 


string aField; 


/// P:NS.MyClass.aProperty 
short aProperty {get {...} set {...}} 


/// T:NS.MyClass.NestedType 
class NestedType {...}; 


/// M:NS.MyClass.X() 
void X() {...} 


/// M:NS.MyClass.Y(System. Int32,System.Double@, System.Decimal@) 
void Y(int p1, ref double p2, out decimal p3) {...} 


/// M:NS.MyClass.Z(System.Char[ ],System.Single[0: ,0:]) 
void Z(char[ ] p1, float[,] p2) {...} 


/// M:NS.MyClass.op_Addition(NS.MyClass,NS.MyClass) 
public static MyClass operator+(MyClass c1, MyClass c2) {...} 


/// M:NS.MyClass.op_Implicit(NS.MyClass)~System. Int32 
public static implicit operator int(MyClass c) {...} 


> 
< 
ag 
HS 
Qa 
o 
ror 
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/// M:NS.MyClass.#ctor 
MyClass() {...} 


/// M:NS.MyClass.Finalize 
~MyClass() {...} 


/// M:NS.MyClass.#cctor 
static MyClass() {...} 
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Framework Overview 


Almost all of the capabilities of NET Core are exposed via a vast set of managed 
types. These types are organized into hierarchical namespaces and packaged into a 
set of assemblies, which together with the CLR comprise the .NET Core platform. 


Some of the .NET types are used directly by the CLR and are essential for the man- 
aged hosting environment. These types reside in an assembly called System.Pri- 
vate.CoreLib.dll. They include C#’s built-in types as well as the basic collection 
classes, and types for stream processing, serialization, reflection, threading, and 
native interoperability. 


System.Private.CoreLib.dll replaces .NET Framework’s mscor- 
lib.dll. Many places in the official documentation still refer to 
mscorlib. 


At a level above this are additional types that “flesh out” the CLR-level functionality, 
providing features such as XML, JSON, networking, and Language-Integrated 
Query (LINQ). These comprise the Base Class Library (BCL). Sitting above this are 
application frameworks, which provide APIs for developing particular kinds of 
applications such as web or rich client. 


In this chapter, we provide the following: 


e An overview of the BCL (which we cover in the rest of the book) 


¢ A high-level summary of the available application frameworks 


.NET Standard 


In Chapter 1, we said that there are four major framework choices: 


e .NET Core 
e UWP 
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e Mono + Xamarin (for mobile device development) 


e .NET Framework 





What's New in .NET Core 3 


The new core features of NET Core 3 include a built-in high-performance JSON 
reader/writer and serializer (see Chapter 11), and support for C# 8 and .NET Stan- 
dard 2.1. 


Note that .NET Core 3 succeeds both .NET Core 2.2 and .NET Framework. If you're 
coming from .NET Framework, the following features are all new: 


¢ Built-in immutable collections (see “Immutable Collections” on page 357 in 
Chapter 7) 


« AssemblyLoadContext: a new API for loading, resolving, and isolating assem- 
blies that significantly improves on Assembly.LoadFile and Assembly. LoadFrom 
(see “Loading, Resolving, and Isolating Assemblies” on page 775 in Chapter 18) 


e The Span<T> and Memory<T> structs, which help reduce memory allocations in 
performance hotspots (see Chapter 24) 


« Array and memory pooling to reduce the load on the garbage collector (see 
“Array Pooling” on page 541 in Chapter 12) 


« Anew EventCounter API for performance monitoring (see Chapter 13) 


¢ Startup hooks for injecting code into an application’s Main method at runtime 


As well as general performance improvements to the CLR and garbage collec- 
tor, NET Core 3’s CLR includes an improvement to Just-In-Time (JIT) compilation, 
called tiered compilation, whereby the CLR automatically identifies performance 
hotspots as a program is running and then selectively re-JITs the Intermediate Lan- 
guage to higher-quality native code. 


-NET Core 3 also has new deployment features: 


e An Ahead-Of-Time (AOT) compilation option, allowing an application to be 
compiled to native code before being deployed (in lieu of INET Framework’s 
client-side native image generation tool) 


e Support for single-file executables with assembly linking to trim unused 
assemblies 


¢ Support for MSIX, a new Windows deployment format 


Some .NET Framework APIs are absent from .NET Core 3, notably Windows Com- 
munication Foundation (WCF), Windows Workflow, Web Forms, Remoting, and 
application domains. AssemblyLoadContext provides a partial replacement for appli- 
cation domains (see Chapter 18) with reduced isolation and limited support for 
unloading. 
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Each framework contains its own CLR and BCL. The good news is that at the time 
of .NET Core 2.0’s release, these frameworks converged in their core functionality, 
and now all offer a BCL with similar types and members. This commonality has 
been formalized into a standard called .NET Standard 2.0. 


.NET Standard 2.0 


A library that targets INET Standard 2.0 instead of a specific framework is usefully 
portable. The same assembly will run without modification on most of today’s pop- 
ular frameworks, including the following: 


e .NET Core 2.0+ 

e UWP 10.0.16299+ 

e Mono 5.4+ 

e .NET Framework 4.6.1+ 


To target NET Standard 2.0, add the following to your .csproj file: 


<PropertyGroup> 
<TargetFramework>netstandard2.0</TargetFramework> 


<PropertyGroup> 


.NET Standard is not a framework; it's merely a specification 
describing a minimum baseline of functionality (types and 
members) that guarantees compatibility with a certain set of 
frameworks. The concept is similar to C# interfaces: .NET 
Standard is like an interface that concrete types (frameworks) 


can implement. 


NET Standard 2.1 


.NET Core 3 also supports .NET Standard 2.1, a superset of .NET Standard 2.0 that 
exposes most of the additional types that were introduced with .NET Core 3. How- 
ever, .NET Standard 2.1 is not supported by any version of .NET Framework (and 
not even by UWP as of this writing), making it much less useful than .NET Stan- 
dard 2.0. 

The following APIs, in particular, are available in .NET Standard 2.1 (but not .NET 
Standard 2.0): 


e Span<T> (Chapter 24) 
e Reflection. Emit (Chapter 19) 
e ValueTask<T> (Chapter 14) 
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Older .NET Standards 


There are also older .NET Standards, most notably 1.1, 1.2, 1.3, and 1.6. A higher- 
numbered standard is always a strict superset of a lower-numbered standard. For 
instance, if you write a library that targets NET Standard 1.6, you will support not 
only recent versions of the major frameworks, but also .NET Core 1.0. And if you 
target .NET Standard 1.3, you support everything we've already mentioned 
plus .NET Framework 4.6.0. The following table elaborates: 


If you target... You also support... 


Standard 1.6 — .NET Core 1.0 

Standard 1.3 Above plus .NET 4.6.0 

Standard 1.2 Above plus .NET 4.5.1, Windows Phone 8.1, WinRT for Windows 8.1 
Standard 1.1 Above plus .NET 4.5.0, Windows Phone 8.0, WinRT for Windows 8.0 





The 1.x standards lack thousands of APIs that are present in 
2.0, including much of what we describe in this book. This can 
make targeting a 1.x standard significantly more challenging, 
especially if you need to integrate existing code or libraries. 


You can also think of .NET Standard as a lowest common denominator. In the case 
of .NET Standard 2.0, the frameworks that implement it have a similar BCL, so the 
lowest common denominator is big and useful. However, if you also want compati- 
bility with .NET Core 1.0 (with its significantly cut-down BCL), the lowest common 
denominator—.NET Standard 1.x—becomes much smaller and less useful. 


.NET Framework and .NET Core Compatibility 


Because .NET Framework has existed for so long, it’s not uncommon to encounter 
libraries that are available only for .NET Framework (with no .NET Standard 
or .NET Core equivalent). To help mitigate this situation, .NET Core projects are 
permitted to reference .NET Framework assemblies, with the following provisos: 


e An exception is thrown should the .NET Framework assembly call an API 
that’s not supported in .NET Core. 
¢ Nontrivial dependencies might fail to resolve. 
In practice, it’s most likely to work with assemblies that perform a simple function, 


such as wrapping an unmanaged DLL, or that rely on a single well-supported API, 
such as WPF or Windows Forms. 


Framework and C# Language Versions 


The C# compiler chooses a language version automatically based on the framework 
that your project targets: 
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e For .NET Core 3.x and .NET Standard 2.1, it chooses C# 8. 


e For .NET Core 2.x, .NET Framework, and .NET Standard 2.0 and below, it 
chooses C# 7.3. 


This is because C# 8’s new features rely on types that are available only in .NET 
Core 3+ or .NET Standard 2.1+. 


Reference Assemblies 


When you target .NET Standard, your project implicitly references an assembly 
called netstandard.dll, which contains all of the allowable types and members for 
your chosen version of .NET Standard. This is called a reference assembly because it 
exists only for the benefit of the compiler and contains no compiled code. At run- 
time, the “real” assemblies are identified through assembly redirection attributes 
(the choice of assemblies will depend on which framework and platform the assem- 
bly eventually runs on). 


Interestingly, a similar thing happens when you target .NET Core. Your project 
implicitly references a set of reference assemblies whose types mirror what's in the 
runtime assemblies for the chosen .NET Core version. This helps with versioning 
and cross-platform compatibility, and also allows you to target a different .NET 
Core version than what is installed on your machine. For instance, if you've 
installed NET Core 3, your project can still target INET Core 2.2, and thanks to a 
set of reference assemblies, the compiler will see only the types and members avail- 
able to .NET Core 2.2. 


The CLR and BCL 
System Types 


The most fundamental types live directly in the System namespace. These include 
C#’s built-in types, the Exception base class, the Enum, Array, and Delegate base 
classes, and Nullable, Type, DateTime, TimeSpan, and Guid. The System namespace 
also includes types for performing mathematical functions (Math), generating ran- 
dom numbers (Random), and converting between various types (Convert and 
BitConverter). 


Chapter 6 describes these types as well as the interfaces that define standard proto- 
cols used across the Framework for such tasks as formatting (IFormattable) and 
order comparison (IComparable). 


The System namespace also defines the IDisposable interface and the GC class for 
interacting with the garbage collector, which we cover in Chapter 12. 
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Text Processing 


The System.Text namespace contains the StringBuilder class (the editable or 
mutable cousin of string) and the types for working with text encodings, such as 
UTF-8 (Encoding and its subtypes). We cover this in Chapter 6. 


The System. Text.RegularExpressions namespace contains types that perform 
advanced pattern-based search-and-replace operations; we describe these in 
Chapter 26. 


Collections 


.NET Core offers a variety of classes for managing collections of items. These 
include both list- and dictionary-based structures; they work in conjunction with a 
set of standard interfaces that unify their common characteristics. All collection 
types are defined in the following namespaces, covered in Chapter 7: 


System.Collections // Nongeneric collections 
System.Collections.Generic // Generic collections 
System.Collections. Specialized // Strongly typed collections 
System.Collections.ObjectModel // Bases for your own collections 
System.Collections.Concurrent // Thread-safe collection (Chapter 23) 


Querying 
LINQ allows you to perform type-safe queries over local and remote collections 
(e.g., SQL Server tables) and is described in Chapters 8 through 10. A big advantage 


of LINQ is that it presents a consistent querying API across a variety of domains. 
The essential types reside in the following namespaces: 


System.Ling // LINQ to Objects and PLINQ 
System.Ling.Expressions // For building expressions manually 
System.XmlL.Ling // LINQ to XML 


XML and JSON 


XML and JSON are widely supported in .NET Core. Chapter 10 focuses entirely on 
LINQ to XML—a lightweight XML Document Object Model (DOM) that can be 
constructed and queried through LINQ. Chapter 11 covers the performant low-level 
XML reader/writer classes, XML schemas and stylesheets, and the types for working 
with JSON: 


System. Xml // XmlReader, XmlWriter 

System. XmlL.Ling // The LINQ to XML DOM 

System. Xml. Schema // Support for XSD 

System.Xml.Serialization // Declarative XML serialization for .NET types 
System. Xml. XPath // XPath query Language 

System.XmL.Xsl // Stylesheet support 

System. Text. Json // JSON reader/writer and document object model 


In Chapter 17 (Serialization), we cover the JSON serializer. 
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Diagnostics 


In Chapter 13, we cover logging and assertion and describe how to interact with 
other processes, write to the Windows event log, and handle performance monitor- 
ing. The types for this are defined in and under System. Diagnostics. 


Concurrency and Asynchrony 


Many modern applications need to deal with more than one thing happening at a 
time. Since C# 5.0, this has become easier through asynchronous functions and 
high-level constructs such as tasks and task combinators. Chapter 14 explains all of 
this in detail, after starting with the basics of multithreading. Types for working 
with threads and asynchronous operations are in the System.Threading and 
System. Threading. Tasks namespaces. 


Streams and I/0 


.NET Core provides a stream-based model for low-level I/O. Streams are typically 
used to read and write directly to files and network connections, and can be chained 
or wrapped in decorator streams to add compression or encryption functionality. 
Chapter 15 describes the stream architecture as well as the specific support for 
working with files and directories, compression, pipes, and memory-mapped files. 
The Stream and I/O types are defined in and under the System. 10 namespace, and 
the Windows Runtime (WinRT) types for file I/O are in and under Windows 
. Storage. 


Networking 


You can directly access standard network protocols such as HTTP, FTP, TCP/IP, and 
SMTP via the types in System.Net. In Chapter 16, we demonstrate how to commu- 
nicate using each of these protocols, starting with simple tasks such as downloading 
from a web page and finishing with using TCP/IP directly to retrieve POP3 email. 
Here are the namespaces we cover: 


System.Net 

System.Net.Http // HttpClient 

System.Net.Mail // For sending mail via SMTP 

System.Net.Sockets // TCP, UDP, and IP 
Serialization 


The Framework provides several systems for saving and restoring objects to a 
binary or text representation. Such systems can be required for communication as 
well as saving and restoring objects to a file. In Chapter 17, we cover the three major 
serialization engines: the binary serializer, the JSON serializer, and the XML serial- 
izer. The types for serialization reside in the following namespaces: 


System.Runtime.Serialization 
System.XmL.Serialization 
System. Text. Json 
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Assemblies, Reflection, and Attributes 


The assemblies into which C# programs compile comprise executable instructions 
(stored as IL) and metadata, which describes the program’s types, members, and 
attributes. Through reflection, you can inspect this metadata at runtime and do such 
things as dynamically invoke methods. With Reflection.Emit, you can construct 
new code on the fly. 


In Chapter 18, we describe the makeup of assemblies and how to dynamically load 
and isolate them. In Chapter 19, we cover reflection and attributes—describing how 
to inspect metadata, dynamically invoke functions, write custom attributes, emit 
new types, and parse raw IL. The types for using reflection and working with assem- 
blies reside in the following namespaces: 


System 
System.Reflection 
System.Reflection. Emit 


Dynamic Programming 


In Chapter 20, we look at some of the patterns for dynamic programming and uti- 
lizing the Dynamic Language Runtime (DLR). We describe how to implement the 
Visitor pattern, write custom dynamic objects, and interoperate with IronPython. 
The types for dynamic programming are in System. Dynamic. 


Cryptography 

.NET Core provides extensive support for popular hashing and encryption proto- 
cols. In Chapter 21, we cover hashing, symmetric and public-key encryption, and 
the Windows Data Protection API. The types for this are defined in: 


System. Security 
System.Security.Cryptography 


Advanced Threading 


C#’s asynchronous functions make concurrent programming significantly easier 
because they lessen the need for lower-level techniques. However, there are still 
times when you need signaling constructs, thread-local storage, reader/writer locks, 
and so on. Chapter 22 explains this in depth. Threading types are in the 
System. Threading namespace. 


Parallel Programming 


In Chapter 23, we cover in detail the libraries and types for leveraging multicore 
processors, including APIs for task parallelism, imperative data parallelism, and 
functional parallelism (PLINQ). 
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Span<T> and Memory<T> 


To help with micro-optimizing performance hotspots, the CLR provides a number 
of types to help you program in such a way as to reduce the load on the memory 
manager. Two of the key types are Span<T> and Memory<T>, which we describe in 
Chapter 24. 


Native and COM Interoperability 


You can interoperate with both native and Component Object Model (COM) code. 
Native interoperability allows you to call functions in unmanaged DLLs, register 
callbacks, map data structures, and interoperate with native data types. COM inter- 
operability allows you to call COM types (on Windows machines), and expose .NET 
Core types to COM. The types that support these functions are in 
System.Runtime. InteropServices, and we cover them in Chapter 25. 


Regular Expressions 


In Chapter 26, we cover how you can use regular expressions to match character 
patterns in strings. 


The Roslyn Compiler 


The C# compiler itself is written in C#—the project is called “Roslyn,” and the libra- 
ries are available as NuGet packages. With these libraries, you can utilize the com- 
piler’s functionality in many ways besides compiling source code to an assembly, 
such as writing code analysis and refactoring tools. We cover this topic in 
Chapter 27. 


Application Frameworks 


UI-based applications can be divided into two categories: thin client, which 
amounts to a website, and rich client, which is a program the end user must down- 
load and install on a computer or mobile device. 


For writing thin-client applications in C#, there’s ASP.NET Core, which runs on 
Windows, Linux, and macOS. ASP.NET Core is also designed for writing web APIs. 


For rich-client applications, there are a choice of APIs: 
¢ The Windows Desktop framework includes the WPF and Windows Forms 
APIs, and runs on Windows 7/8/10 desktop 
¢ UWP runs on Windows 10 desktop and devices 
e Xamarin runs on iOS and Android mobile devices 


There are also third-party libraries, such as Avalonia, which offers cross-platform 
UI support. 
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ASP.NET Core 


ASP.NET Core is a lightweight modular successor to ASP.NET, with support for the 
popular Model-View-Controller (MVC) pattern. ASP.NET Core is suitable for creat- 
ing websites, REST-based web APIs, and microservices. It can also run in conjunc- 
tion with two popular single-page-application frameworks: React and Angular. 


ASP.NET Core runs on Windows, Linux, and macOS and can self-host in a custom 
process. Unlike its predecessor (ASP.NET), ASPNET Core is not dependent on 
System.Web and the historical baggage of web forms. 


As with any thin-client architecture, ASPNET Core offers the following general 
advantages over rich clients: 

¢ There is zero deployment at the client end. 

¢ The client can run on any platform that supports a web browser. 


¢ Updates are easily deployed. 


Windows Desktop 


The Windows Desktop application framework offers a choice of two UI APIs for 
writing rich-client applications: WPF and Windows Forms. Both APIs run on Win- 
dows Desktop/Server 7 through 10. 


WPF 


WPF was introduced in 2006, and has been enhanced ever since. Unlike its prede- 
cessor, Windows Forms, WPF explicitly renders controls using DirectX, with the 
following benefits: 


¢ It supports sophisticated graphics, such as arbitrary transformations, 3D ren- 
dering, multimedia, and true transparency. Skinning is supported through 
styles and templates. 


¢ Its primary measurement unit is not pixel-based, so applications display cor- 
rectly at any DPI setting. 


¢ It has extensive and flexible layout support, which means that you can localize 
an application without danger of elements overlapping. 


Its use of DirectX makes rendering fast and able to take advantage of graphics 
hardware acceleration. 


It offers reliable data binding. 


¢ Uls can be described declaratively in XAML files that can be maintained inde- 
pendent of the “code-behind” files—this helps to separate appearance from 
functionality. 
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WPF takes some time to learn due to its size and complexity. The types for writing 
WPF applications are in the System.Windows namespace and all subnamespaces 
except for System.Windows. Forms. 


Windows Forms 


Windows Forms is a rich-client API that shipped with the first version of .NET 
Framework in 2000. Compared to WPF, Windows Forms is a relatively simple tech- 
nology that provides most of the features you need in writing a typical Windows 
application. It also has significant relevancy in maintaining legacy applications. But 
compared to WPE, it has numerous drawbacks, most of which stem from it being a 
wrapper over GDI+ and the Win32 control library: 


¢ Although it provides mechanisms for DPI awareness, it’s still too easy to write 
applications that break on clients whose DPI settings differ from the 
developer's. 


¢ The API for drawing nonstandard controls is GDI+, which, although reasona- 
bly flexible, is slow in rendering large areas (and, without double buffering, 
might flicker). 


Controls lack true transparency. 


¢ Most controls are noncompositional. For instance, you can’t put an image con- 
trol inside a tab control header. Customizing list views, combo boxes, and tab 
controls in a way that would be trivial with WPF is time consuming and pain- 
ful in Windows Forms. 


e Dynamic layout is difficult to correctly implement. 


The last point is an excellent reason to favor WPF over Windows Forms—even if 
youre writing a business application that needs just a UI and not a “user experi- 
ence.” The layout elements in WPF, such as Grid, make it easy to assemble labels 
and text boxes such that they always align—even after language-changing localiza- 
tion—without messy logic and without any flickering. Further, you don’t need to 
bow to the lowest common denominator in screen resolution—WPF layout ele- 
ments have been designed from the outset to adapt properly to resizing. 


On the positive side, Windows Forms is relatively simple to learn and still has a 
good number of third-party controls. 


The Windows Forms types are in the System.Windows.Forms (in System.Win- 
dows.Forms.dll) and System.Drawing (in System.Drawing.dll) namespaces. The lat- 
ter also contains the GDI+ types for drawing custom controls. 


UWP 


UWP is a rich-client API for writing touch-first UIs that target Windows 10 desktop 
and devices. The word “Universal” refers to its ability to run on a range of Windows 
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10 devices, including Xbox, Surface Hub, and Hololens. However, it’s not compati- 
ble with earlier versions of Windows, including Windows 7 and Windows 8/8.1. 


The UWP API uses XAML and is somewhat similar to WPE Here are its key 
differences: 


¢ The primary mode of distribution for UWP apps is the Windows Store. 


e UWP apps run in a sandbox to lessen the threat of malware, which means that 
they cannot perform tasks such as reading or writing arbitrary files, and they 
cannot run with administrative elevation. 


e UWP relies on WinRT types that are part of the operating system (Windows), 
not the managed framework. This means that when writing apps, you must 
nominate a Windows 10 version range (such as Windows 10 build 17763 to 
Windows 10 build 18362). This means that you either need to target an old 
API, or require that your customers install the latest Windows update. 


To address the last point, Microsoft is introducing WinUI 3, which transfers the 
WinRT APIs from the operating system to the framework. WinUI 3 will also help to 
bridge the divide between Windows Desktop and UWP: rather than having to 
choose one or the other, you'll be able to mix and match components from each. 


The namespaces in UWP are Windows .UI and Windows .UI.Xaml. 


Xamarin 


Xamarin lets you write mobile apps in C# that target iOS and Android. Xamarin 
doesn't run on .NET Core, but on Mono (a derivation of the open source Mono 
framework). See Xamarin’s website for more information. 
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Framework Fundamentals 


Many of the core facilities that you need when programming are provided not by 
the C# language, but by types in .NET Core. In this chapter, we cover types that help 
with fundamental programming tasks, such as virtual equality comparison, order 
comparison, and type conversion. We also cover the basic .NET types, such as 
string, DateTime, and Enum. 


The types in this section reside in the System namespace, with the following 
exceptions: 

e StringBuilder is defined in System. Text, as are the types for text encodings. 

¢ CultureInfo and associated types are defined in System.Globalization. 


e XmlConvert is defined in System. Xml. 


String and Text Handling 


char 


A C# char represents a single Unicode character and aliases the System.Char struct. 
In Chapter 2, we described how to express char literals: 

char c = ‘A’; 

char newLine = '\n'; 
System.Char defines a range of static methods for working with characters, such as 


ToUpper, ToLower, and IsWhiteSpace. You can call these through either the 
System.Char type or its char alias: 


Console.WriteLine (System.Char.ToUpper ('c')); // Cc 
Console.WriteLine (char.IsWhiteSpace ('\t')); // True 
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ToUpper and ToLower honor the end user’s locale, which can lead to subtle bugs. 
The following expression evaluates to false in Turkey: 


char.ToUpper ('i') == 'I 


The reason is that, in Turkey, char.ToUpper ('t') is 'I' (notice the dot on top!). 
To avoid this problem, System.Char (and System.String) also provides culture- 
invariant versions of ToUpper and ToLower ending with the word Invariant. These 
always apply English culture rules: 


Console.WriteLine (char.ToUpperInvariant ('i')); // 1 
This is a shortcut for: 
Console.WriteLine (char.ToUpper ('i', CultureInfo.InvariantCuLture) ) 


For more on locales and culture, see “Formatting and Parsing” on page 270. 


Most of char’s remaining static methods are related to categorizing characters. 
Table 6-1 lists these. 


Table 6-1. Static methods for categorizing characters 


Static method Characters included Unicode categories included 
IsLetter A-Z, a-z, and letters of other alphabets UpperCaseLetter 
LowerCaseLetter 
TitleCaseLetter 
ModifierLetter 
OtherLetter 
IsUpper Uppercase letters UpperCaseLetter 
IsLower Lowercase letters LowerCaseLetter 
IsDigit 0-9 plus digits of other alphabets DecimalDigitNumber 
IsLetterOrDigit Letters plus digits (IsLetter, IsDigit) 
IsNumber All digits plus Unicode fractions and Roman DecimalDigitNumber 
numeral symbols LetterNumber 
OtherNumber 
IsSeparator Space plus all Unicode separator characters LineSeparator 
ParagraphSeparator 
IsWhiteSpace All separators plus \n, \r, \t, \f, and \v LineSeparator 
ParagraphSeparator 


IsPunctuation Symbols used for punctuation in Western and DashPunctuation 
other alphabets ConnectorPunctuation 
InitialQuotePunctuation 
FinalQuotePunctuation 


IsSymbol Most other printable symbols MathSymbol 
ModifierSymbol 
OtherSymbol 
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Static method Characters included Unicode categories included 


IsControl Nonprintable “control” characters below 0x20, (None) 
such as \r, \n, \t, and \O, and characters 
between 0x7F and 0x9A 





For more granular categorization, char provides a static method called GetUnicode 
Category; this returns a UnicodeCategory enumeration whose members are shown 
in the rightmost column of Table 6-1. 


By explicitly casting from an integer, it’s possible to produce a 
char outside the allocated Unicode set. To test a character’s 
validity, call char.GetUnicodeCategory: if the result is 
UnicodeCategory.OtherNotAssigned, the character is invalid. 


A char is 16 bits wide—enough to represent any Unicode character in the Basic 
Multilingual Plane. To go beyond this, you must use surrogate pairs: we describe the 
methods for doing this in “Text Encodings and Unicode” on page 253. 


string 


A C# string (== System.String) is an immutable (unchangeable) sequence of 
characters. In Chapter 2, we described how to express string literals, perform equal- 
ity comparisons, and concatenate two strings. This section covers the remaining 
functions for working with strings, exposed through the static and instance mem- 
bers of the System.String class. 


Constructing strings 


The simplest way to construct a string is to assign a literal, as we saw in Chapter 2: 


string si = "Hello"; 
string s2 = "First Line\r\nSecond Line"; 
string s3 = @"\\server\fileshare\helloworld.cs"; 


py 
5 
aw 
9 3 
30 
83 
re 
7) 





To create a repeating sequence of characters, you can use string’s constructor: 
Console.Write (new string ('*', 10)); [| kee RRR 


You can also construct a string from a char array. The ToCharArray method does 
the reverse: 


char[] ca = "Hello".ToCharArray(); 
string s = new string (ca); // s = "Hello" 


string’s constructor is also overloaded to accept various (unsafe) pointer types, in 
order to create strings from types such as char*. 
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Null and empty strings 


An empty string has a length of zero. To create an empty string, you can use either a 
literal or the static string.Empty field; to test for an empty string, you can either 
perform an equality comparison or test its Length property: 


string empty = ""; 

Console.WriteLine (empty == ""); // True 
Console.WriteLine (empty == string.Empty); // True 
Console.WriteLine (empty.Length == 0); // True 


Because strings are reference types, they can also be null: 


string nullString = null; 

Console.WriteLine (nullString == null); // True 

Console.WriteLine (nullString == ""); // False 
Console.WriteLine (nullString.Length == 0); // NullReferenceException 


The static string. IsNullOrEmpty method is a useful shortcut for testing whether a 
given string is either null or empty. 


Accessing characters within a string 


A string’s indexer returns a single character at the given index. As with all functions 
that operate on strings, this is zero-indexed: 


string str = "abcde"; 
char letter = str[1]; // letter == 'b' 


string also implements IEnumerable<char>, so you can foreach over its 
characters: 


foreach (char c in "123") Console.Write (c + ","); [fi A32535 


Searching within strings 


The simplest methods for searching within strings are StartsWith, EndsWith, and 
Contains. These all return true or false: 


Console.WriteLine ("quick brown fox".EndsWith ("fox")); // True 
Console.WriteLine ("quick brown fox".Contains ("brown")); // True 


StartsWith and EndsWith are overloaded to let you specify a StringComparison 
enum or a CultureInfo object to control case and culture sensitivity (see “Ordinal 
versus culture comparison” on page 250). The default is to perform a case-sensitive 
match using rules applicable to the current (localized) culture. The following 
instead performs a case-insensitive search using the invariant culture's rules: 


"abcdef".StartsWith ("aBc", StringComparison. InvariantCultureIgnoreCase) 


The Contains method doesn't offer the convenience of this overload, although you 
can achieve the same result with the IndexOf method. 


IndexOf is more powerful: it returns the first position of a given character or sub- 
string (or -1 if the substring isn’t found): 
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Console.WriteLine ("abcde".IndexOf ("cd")); // 2 


IndexOf is also overloaded to accept a startPosition (an index from which to 
begin searching) as well as a StringComparison enum: 


Console.WriteLine ("abcde abcde".IndexOf ("CD", 6, 
StringComparison.CurrentCultureIgnoreCase) ); // 8 


LastIndexOf is like IndexOf, but it works backward through the string. 
IndexOfAny returns the first matching position of any one of a set of characters: 


Console.Write ("ab,cd ef".IndexOfAny (new char[] {' ', ','} )); // 2 
Console.Write ("pasSwOrd".IndexOfAny ("0123456789".ToCharArray() )); // 3 


LastIndexOfAny does the same in the reverse direction. 


Manipulating strings 


Because string is immutable, all the methods that manipulate a string return a new 
one, leaving the original untouched (the same goes for when you reassign a string 
variable). 


Substring extracts a portion of a string: 


string left3 = "12345".Substring (0, 3); // left3 = "123"; 
string mid3 = "12345".Substring (1, 3); // mid3 = "234"; 


If you omit the length, you get the remainder of the string: 
string end3 = "12345".Substring (2); // end3 = "345"; 
Insert and Remove insert or remove characters at a specified position: 


string si = "helloworld".Insert (5, ", "); // s1 = "hello, world" 
string s2 = si.Remove (5, 2); // s2 = "helloworld"; 


PadLeft and PadRight pad a string to a given length with a specified character (or a 
space if unspecified): 
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Console.WriteLine ("12345".PadLeft (9, '*')); // ****12345 
Console.WriteLine ("12345".PadLeft (9)); // 12345 


If the input string is longer than the padding length, the original string is returned 
unchanged. 


TrimStart and TrimEnd remove specified characters from the beginning or end of a 
string; Trim does both. By default, these functions remove whitespace characters 
(including spaces, tabs, newlines, and Unicode variations of these): 


Console.WriteLine ("| abc \t\r\n ".Trim().Length); // 3 


Replace replaces all (non-overlapping) occurrences of a particular character or 
substring: 


Console.WriteLine ("to be done".Replace ("", "| ") ); // to | be | done 
Console.WriteLine ("to be done".Replace ("", "") ); // tobedone 
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ToUpper and ToLower return uppercase and lowercase versions of the input string. 
By default, they honor the user’s current language settings; ToUpperInvariant and 
ToLower Invariant always apply English alphabet rules. 


Splitting and joining strings 
Split divides a string into pieces: 
string[] words = "The quick brown fox".Split(); 


foreach (string word in words) 
Console.Write (word + "|"); // The|quick|brown|fox| 


By default, Split uses whitespace characters as delimiters; it’s also overloaded to 
accept a params array of char or string delimiters. Split also optionally accepts a 
StringSplitOptions enum, which has an option to remove empty entries: this is 
useful when words are separated by several delimiters in a row. 


The static Join method does the reverse of Split. It requires a delimiter and string 


array: 


string[] words = "The quick brown fox".Split(); 

string together = string.Join (" ", words); // The quick brown fox 
The static Concat method is similar to Join but accepts only a params string array 
and applies no separator. Concat is exactly equivalent to the + operator (the com- 
piler, in fact, translates + to Concat): 


string sentence = string.Concat ("The", " quick", " brown", " fox"); 
string sameSentence = "The" + " quick" + " brown" + " fox"; 


string.Format and composite format strings 


The static Format method provides a convenient way to build strings that embed 
variables. The embedded variables (or values) can be of any type; the Format simply 
calls ToString on them. 


The master string that includes the embedded variables is called a composite format 
string. When calling string.Format, you provide a composite format string fol- 
lowed by each of the embedded variables: 


string composite = "It's {0} degrees in {1} on this {2} morning"; 
string s = string.Format (composite, 35, "Perth", DateTime.Now.DayOfWeek) ; 


// s == "It's 35 degrees in Perth on this Friday morning" 
(And that’s Celsius!) 


We can use interpolated string literals to the same effect (see “String Type” on page 
46 in Chapter 2). Just precede the string with the $ symbol and put the expressions 
in braces: 


string s = $"It's hot this {DateTime.Now.DayOfWeek} morning"; 
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Each number in curly braces is called a format item. The number corresponds to the 
argument position and is optionally followed by: 

e Acomma anda minimum width to apply 

¢ Acolon and a format string 
The minimum width is useful for aligning columns. If the value is negative, the data 
is left-aligned; otherwise, it’s right-aligned: 


string composite = "Name={0,-20} Credit Limit={1,15:C}"; 


Console.WriteLine (string.Format (composite, "Mary", 500)); 
Console.WriteLine (string.Format (composite, "Elizabeth", 20000)); 


Here’s the result: 


Name=Mary Credit Limit= $500.00 
Name=Elizabeth Credit Limit= $20,000.00 


Here's the equivalent without using string. Format: 


string s = "Name=" + "Mary".PadRight (20) + 
"Credit Limit=" + 500.ToString ("C").PadLeft (15); 


The credit limit is formatted as currency by virtue of the "C" format string. We 
describe format strings in detail in “Formatting and Parsing” on page 270. 


Comparing Strings 


In comparing two values, .NET Core differentiates the concepts of equality compari- 
son and order comparison. Equality comparison tests whether two instances are 
semantically the same; order comparison tests which of two (if any) instances comes 
first when arranging them in ascending or descending sequence. 


Equality comparison is not a subset of order comparison; the 
two systems have different purposes. It’s legal, for instance, to 
have two unequal values in the same ordering position. We 
resume this topic in “Equality Comparison” on page 296. 
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For string equality comparison, you can use the == operator or one of string’s 
Equals methods. The latter are more versatile because they allow you to specify 
options such as case insensitivity. 


Another difference is that == does not work reliably on strings 
if the variables are cast to the object type. We explain why 
this is so in “Equality Comparison” on page 296. 


For string order comparison, you can use either the CompareTo instance method or 
the static Compare and CompareOrdinal methods. These return a positive or nega- 
tive number, or zero, depending on whether the first value comes after, before, or 
alongside the second. 
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Before going into the details of each, we need to examine .NET’s underlying string 
comparison algorithms. 


Ordinal versus culture comparison 


There are two basic algorithms for string comparison: ordinal and culture sensitive. 
Ordinal comparisons interpret characters simply as numbers (according to their 
numeric Unicode value); culture-sensitive comparisons interpret characters with 
reference to a particular alphabet. There are two special cultures: the current culture, 
which is based on settings picked up from the computer’s control panel, and the 
invariant culture, which is the same on every computer (and closely matches Ameri- 
can culture). 


For equality comparison, both ordinal and culture-specific algorithms are useful. 
For ordering, however, culture-specific comparison is nearly always preferable: to 
order strings alphabetically, you need an alphabet. Ordinal relies on the numeric 
Unicode point values, which happen to put English characters in alphabetical order 
—but even then, not exactly as you might expect. For example, assuming case sensi- 
tivity, consider the strings "Atom", "atom", and "Zamia". The invariant culture puts 
them in the following order: 


"atom", "Atom", "Zamia" 
Ordinal arranges them instead as follows: 
"Atom", "Zamia", "atom" 


This is because the invariant culture encapsulates an alphabet, which considers 
uppercase characters adjacent to their lowercase counterparts (aAbBcCdD...). The 
ordinal algorithm, however, puts all the uppercase characters first, and then all low- 
ercase characters (A...Z, a...z). This is essentially a throwback to the ASCII character 
set invented in the 1960s. 


String equality comparison 


Despite ordinal’s limitations, string’s == operator always performs ordinal case- 
sensitive comparison. The same goes for the instance version of string.Equals 
when called without arguments; this defines the “default” equality comparison 
behavior for the string type. 


The ordinal algorithm was chosen for string’s == and Equals 
functions because it’s both highly efficient and deterministic. 
String equality comparison is considered fundamental and is 
performed far more frequently than order comparison. 


A strict notion of equality is also consistent with the general 
use of the == operator. 


The following methods allow culture-aware or case-insensitive comparisons: 
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public bool Equals(string value, StringComparison comparisonType); 


public static bool Equals (string a, string b, 
StringComparison comparisonType); 


The static version is advantageous in that it still works if one or both of the strings 
are null. StringComparison is an enum defined as follows: 


public enum StringComparison 


{ 
CurrentCulture, // Case-sensitive 
CurrentCulturelIgnoreCase, 
InvariantCulture, // Case-sensitive 
InvariantCultureIgnoreCase, 
Ordinal, // Case-sensitive 
OrdinalIgnoreCase 

} 


For example: 


Console.WriteLine (string.Equals ("foo", "FOO", 
StringComparison.OrdinalIgnoreCase)); // True 


Console.WriteLine ("U" == "G"); // False 


Console.WriteLine (string.Equals ("u", "a", 
StringComparison.CurrentCulture) ); // ? 
(The result of the third example is determined by the computer’s current language 
settings.) 


String order comparison 


String’s CompareTo instance method performs culture-sensitive, case-sensitive order 
comparison. Unlike the == operator, CompareTo does not use ordinal comparison: 
for ordering, a culture-sensitive algorithm is much more useful. Here’s the method's 


definition: 
public int CompareTo (string strB); 


The CompareTo instance method implements the generic 
IComparable interface, a standard comparison protocol used 
across the .NET Framework. This means string’s CompareTo 
defines the default ordering behavior of strings in such appli- 
cations as sorted collections, for instance. For more informa- 
tion on IComparable, see “Order Comparison” on page 306. 


For other kinds of comparison, you can call the static Compare and CompareOrdinal 
methods: 


public static int Compare (string strA, string strB, 
StringComparison comparisonType); 


public static int Compare (string strA, string strB, bool ignoreCase, 
CultureInfo culture); 
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public static int Compare (string strA, string strB, bool ignoreCase); 
public static int CompareOrdinal (string strA, string strB); 
The last two methods are simply shortcuts for calling the first two methods. 


All of the order comparison methods return a positive number, a negative number, 
or zero depending on whether the first value comes after, before, or alongside the 
second value: 


Console.WriteLine ("Boston".CompareTo ("Austin")); // 1 


Console.WriteLine ("Boston".CompareTo ("Boston")); // 9 
Console.WriteLine ("Boston".CompareTo ("Chicago")); // -1 
Console.WriteLine ("U". CompareTo C"a"))3 // 9 
Console.WriteLine ("foo".CompareTo ("FOO")); // -1 


The following performs a case-insensitive comparison using the current culture: 
Console.WriteLine (string.Compare ("foo", "FOO", true)); // 0 
By supplying a CultureInfo object, you can plug in any alphabet: 


// CultureInfo is defined in the System.Globalization namespace 


CultureInfo german = CultureInfo.GetCultureInfo ("de-DE"); 
int i = string.Compare ("Miller", "Muller", false, german); 


StringBuilder 


The StringBuilder class (System. Text namespace) represents a mutable (editable) 
string. With a StringBuilder, you can Append, Insert, Remove, and Replace sub- 
strings without replacing the whole StringBuilder. 


StringBuilder’s constructor optionally accepts an initial string value as well as a 
starting size for its internal capacity (default is 16 characters). If you go beyond this, 
StringBuilder automatically resizes its internal structures to accommodate (at a 
slight performance cost) up to its maximum capacity (default is int .MaxValue). 


A popular use of StringBuilder is to build up a long string by repeatedly calling 
Append. This approach is much more efficient than repeatedly concatenating ordi- 
nary string types: 


StringBuilder sb = new StringBuilder(); 
for (int i = 0; i < 50; i++) sb.Append(i).Append(","); 


To get the final result, call ToString(): 


Console.WriteLine (sb.ToString()); 


0,1,2,3,4,5,6,7,8,9,10,11,12,13,14,15,16,17,18,19,20,21,22,23,24,25,26, 
27,28,29, 30,31, 32, 33, 34, 35, 36, 37,38, 39,40,41,42,43,44,45,46,47,48,49, 


AppendLine performs an Append that adds a newline sequence ("\r\n" in Win- 
dows). AppendFormat accepts a composite format string, just like string. Format. 





252 | Chapter 6: Framework Fundamentals 


In addition to the Insert, Remove, and Replace methods (Replace works like 
string’s Replace), StringBuilder defines a Length property and a writable indexer 
for getting/setting individual characters. 


To clear the contents of a StringBuilder, either instantiate a new one or set its 
Length to zero. 


Setting a StringBuilder’s Length to zero doesn't shrink its 
internal capacity. So, if the StringBuilder previously con- 
tained one million characters, it will continue to occupy 
around two megabytes of memory after zeroing its Length. If 
you want to release the memory, you must create a new 
StringBuilder and allow the old one to drop out of scope 
(and be garbage-collected). 


Text Encodings and Unicode 


A character set is an allocation of characters, each with a numeric code or code point. 
There are two character sets in common use: Unicode and ASCII. Unicode has an 
address space of approximately one million characters, of which about 100,000 are 
currently allocated. Unicode covers most spoken world languages as well as some 
historical languages and special symbols. The ASCII set is simply the first 128 char- 
acters of the Unicode set, which covers most of what you see on a US-style key- 
board. ASCII predates Unicode by 30 years and is still sometimes used for its 
simplicity and efficiency: each character is represented by one byte. 


The .NET type system is designed to work with the Unicode character set. ASCII is 
implicitly supported, though, by virtue of being a subset of Unicode. 


A text encoding maps characters from their numeric code point to a binary repre- 
sentation. In .NET, text encodings come into play primarily when dealing with text 
files or streams. When you read a text file into a string, a text encoder translates the 
file data from binary into the internal Unicode representation that the char and 
string types expect. A text encoding can restrict what characters can be repre- 
sented as well as affect storage efficiency. 


There are two categories of text encoding in .NET: 


e Those that map Unicode characters to another character set 


e Those that use standard Unicode encoding schemes 


The first category contains legacy encodings such as IBM’s EBCDIC and 8-bit char- 
acter sets with extended characters in the upper-128 region that were popular prior 
to Unicode (identified by a code page). The ASCII encoding is also in this category: 
it encodes the first 128 characters and drops everything else. This category contains 
the nonlegacy GB18030, as well, which is the mandatory standard for applications 
written in China—or sold to China—since 2000. 
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In the second category are UTF-8, UTF-16, and UTF-32 (and the obsolete UTF-7). 
Each differs in space efficiency. UTF-8 is the most space-efficient for most kinds of 
text: it uses between one and four bytes to represent each character. The first 128 
characters require only a single byte, making it compatible with ASCII. UTF-8 is the 
most popular encoding for text files and streams (particularly on the internet), and 
it is the default for stream I/O in .NET (in fact, it’s the default for almost everything 
that implicitly uses an encoding). 


UTF-16 uses one or two 16-bit words to represent each character. This is what .NET 
uses internally to represent characters and strings. Some programs also write files in 
UTF-16. 


UTF-32 is the least space-efficient: it maps each code point directly to 32 bits, so 
every character consumes four bytes. UTF-32 is rarely used for this reason. It does, 
however, make random access very easy because every character takes an equal 
number of bytes. 


Obtaining an Encoding object 


The Encoding class in System. Text is the common base type for classes that encap- 
sulate text encodings. There are several subclasses—their purpose is to encapsulate 
families of encodings with similar features. The easiest way to instantiate a correctly 
configured class is to call Encoding.GetEncoding with a standard Internet Assigned 
Numbers Authority (IANA) Character Set name: 


Encoding utf8 = Encoding.GetEncoding ("utf-8"); 
Encoding chinese = Encoding.GetEncoding ("GB18030"); 


The most common encodings can also be obtained through dedicated static proper- 
ties on Encoding: 


Encoding name Static property on Encoding 


UTF-8 Encoding.UTF8 

UTF-16 Encoding .Unicode (not UTF16) 
UTF-32 Encoding.UTF32 

ASCII Encoding.ASCII 





The static GetEncodings method returns a list of all supported encodings along 
with their standard IANA names: 


foreach (EncodingInfo info in Encoding.GetEncodings()) 
Console.WriteLine (info.Name); 


The other way to obtain an encoding is to directly instantiate an encoding class. 
Doing so allows you to set various options via constructor arguments, including: 
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¢ Whether to throw an exception if an invalid byte sequence is encountered 


when decoding. The default is false. 


¢ Whether to encode/decode UTF-16/UTF-32 with the most significant bytes 
first (big endian) or the least significant bytes first (little endian). The default is 
little endian, the standard on the Windows operating system. 


¢ Whether to emit a byte-order mark (a prefix that indicates endianness). 


Encoding for file and stream |/0 


The most common application for an Encoding object is to control how text is read 
and written to a file or stream. For example, the following writes “Testing...” to a file 


called data. txt in UTF-16 encoding: 


System.1I0.File.WriteAllText ("data.txt", "Testing...", Encoding.Unicode) ; 


If you omit the final argument, WriteAllText applies the ubiquitous UTF-8 


encoding. 


UTFE-8 is the default text encoding for all file and stream I/O. 


We return to this subject in Chapter 15, in “Stream Adapters” on page 653. 


Encoding to byte arrays 


You can also use an Encoding object to go to and from a byte array. The GetBytes 


method converts from string to byte[] with the given encoding; GetString con- 


verts from byte[] to string: 


byte[] utf8Bytes = System.Text.Encoding.UTF8.GetBytes ("0123456789"); 
byte[] utfi6Bytes = System. Text.Encoding.Unicode.GetBytes ("0123456789"); 
byte[] utf32Bytes = System.Text.Encoding.UTF32.GetBytes ("0123456789"); 


Console.WriteLine (utf8Bytes.Length) ; 
Console.WriteLine (utf16Bytes.Length) ; 
Console.WriteLine (utf32Bytes.Length) ; 
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// 10 

// 20 

// 40 


string originali = System. Text.Encoding.UTF8.GetString (utf8Bytes); 
string original2 = System.Text.Encoding.Unicode.GetString (utfi6Bytes); 
string original3 = System.Text.Encoding.UTF32.GetString (utf32Bytes); 


Console.WriteLine (original1); 
Console.WriteLine (original2); 
Console.WriteLine (original3); 


UTF-16 and surrogate pairs 


// 0123456789 
// 0123456789 
// 0123456789 


Recall that .NET stores characters and strings in UTF-16. Because UTF-16 requires 
one or two 16-bit words per character, and a char is only 16 bits in length, some 
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Unicode characters require two chars to represent. This has a couple of 
consequences: 


¢ A string’s Length property can be greater than its real character count. 


¢ A single char is not always enough to fully represent a Unicode character. 


Most applications ignore this because nearly all commonly used characters fit into a 
section of Unicode called the Basic Multilingual Plane (BMP), which requires only 
one 16-bit word in UTF-16. The BMP covers several dozen world languages and 
includes more than 30,000 Chinese characters. Excluded are characters of some 
ancient languages, symbols for musical notation, and some less common Chinese 
characters. 


If you need to support two-word characters, the following static methods in char 
convert a 32-bit code point to a string of two chars, and back again: 


string ConvertFromUtf32 (int utf32) 
int ConvertToUtf32 (char highSurrogate, char lowSurrogate) 


Two-word characters are called surrogates. They are easy to spot because each word 
is in the range 0xD800 to OxDFFE. You can use the following static methods in char 
to assist: 


bool IsSurrogate (char c) 
bool IsHighSurrogate (char c) 
bool IsLowSurrogate (char c) 
bool IsSurrogatePair (char highSurrogate, char lowSurrogate) 


The StringInfo class in the System.Globalization namespace also provides a 
range of methods and properties for working with two-word characters. 


Characters outside the BMP typically require special fonts and have limited operat- 
ing system support. 


Dates and Times 


Three immutable structs in the System namespace do the job of representing dates 
and times: DateTime, DateTimeOffset, and TimeSpan. C# doesn't define any special 
keywords that map to these types. 


TimeSpan 


A TimeSpan represents an interval of time—or a time of the day. In the latter role, it’s 
simply the “clock” time (without the date), which is equivalent to the time since 
midnight, assuming no daylight saving transition. A TimeSpan has a resolution of 
100 ns, has a maximum value of about 10 million days, and can be positive or 
negative. 


There are three ways to construct a TimeSpan: 
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¢ Through one of the constructors 
¢ By calling one of the static From... methods 


¢ By subtracting one DateTime from another 


Here are the constructors: 


public TimeSpan (int hours, int minutes, int seconds); 
public TimeSpan (int days, int hours, int minutes, int seconds); 
public TimeSpan (int days, int hours, int minutes, int seconds, 

int milliseconds); 


public TimeSpan (long ticks); // Each tick = 100ns 


The static From... methods are more convenient when you want to specify an inter- 
val in just a single unit, such as minutes, hours, and so on: 


FromDays (double value); 
FromHours (double value); 
FromMinutes (double value); 
FromSeconds (double value); 
FromMilliseconds (double value); 


public 
public 
public 
public 
public 


static TimeSpan 
static TimeSpan 
static TimeSpan 
static TimeSpan 
static TimeSpan 


For example: 


Console.WriteLine (new TimeSpan (2, 30, 0)); // 92:30:00 
Console.WriteLine (TimeSpan.FromHours (2.5)); // 92:30:00 
Console.WriteLine (TimeSpan.FromHours (-2.5)); // -02:30:00 


TimeSpan overloads the < and > operators as well as the + and - operators. The fol- 
lowing expression evaluates to a TimeSpan of 2.5 hours: 


TimeSpan.FromHours(2) + TimeSpan.FromMinutes(30); 
The next expression evaluates to one second short of 10 days: 


TimeSpan.FromDays(10) - TimeSpan.FromSeconds(1); // 9.23:59:59 
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Using this expression, we can illustrate the integer properties Days, Hours, Minutes, 


Seconds, and Milliseconds: 


TimeSpan nearlyTenDays = TimeSpan.FromDays(10) - 
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TimeSpan.FromSeconds(1); 


Console.WriteLine (nearlyTenDays.Days); // 9 
Console.WriteLine (nearlyTenDays.Hours); // 23 
Console.WriteLine (nearlyTenDays.Minutes) ; // 59 
Console.WriteLine (nearlyTenDays.Seconds) ; // 59 
Console.WriteLine (nearlyTenDays.Milliseconds); // 0 


In contrast, the Total... 


properties return values of type double describing the entire 


time span: 
Console.WriteLine (nearlyTenDays.TotalDays); // 9.99998842592593 
Console.WriteLine (nearlyTenDays.TotalHours); // 239.999722222222 
Console.WriteLine (nearlyTenDays.TotalMinutes) ; // 14399.9833333333 
Console.WriteLine (nearlyTenDays.TotalSeconds) ; // 863999 
Console.WriteLine (nearlyTenDays.TotalMilliseconds); // 863999000 
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The static Parse method does the opposite of ToString, converting a string to a 
TimeSpan. TryParse does the same but returns false rather than throwing an 
exception if the conversion fails. The XmlConvert class also provides TimeSpan/ 
string conversion methods that follow standard XML formatting protocols. 


The default value for a TimeSpan is TimeSpan. Zero. 


TimeSpan can also be used to represent the time of the day (the elapsed time since 
midnight). To obtain the current time of day, call DateTime. Now. TimeOfDay. 


DateTime and DateTimeOffset 


DateTime and DateTimeOffset are immutable structs for representing a date, and 
optionally, a time. They have a resolution of 100 ns, and a range covering the years 
0001 through 9999. 


DateTimeOffset is functionally similar to DateTime. Its distinguishing feature is 
that it also stores a Coordinated Universal Time (UTC) offset; this allows more 
meaningful results when comparing values across different time zones. 


An excellent article on the rationale behind the introduction 
of DateTimeOffset is available on the Microsoft website. The 
title is “A Brief History of DateTime,” by Anthony Moore. 


Choosing between DateTime and DateTimeOffset 


DateTime and DateTimeOffset differ in how they handle time zones. A DateTime 
incorporates a three-state flag indicating whether the DateTime is relative to the 
following: 

¢ The local time on the current computer 

e UTC (the modern equivalent of Greenwich Mean Time) 


¢ Unspecified 


A DateTimeOffset is more specific—it stores the offset from UTC as a TimeSpan: 
July 01 2019 03:00:00 -06:00 
This influences equality comparisons, which is the main factor in choosing between 
DateTime and DateTimeOffset. Specifically: 
¢ DateTime ignores the three-state flag in comparisons and considers two values 
equal if they have the same year, month, day, hour, minute, and so on. 


¢ DateTimeOffset considers two values equal if they refer to the same point in 
time. 
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Daylight saving time can make this distinction important even 
if your application doesn't need to handle multiple geographic 
time zones. 


So, DateTime considers the following two values different, whereas DateTimeOffset 
considers them equal: 


July 01 2019 09:00:00 +00:00 (GMT) 
July 01 2019 03:00:00 -06:00 (local time, Central America) 


In most cases, DateTimeOffset’s equality logic is preferable. For example, in calcu- 
lating which of two international events is more recent, a DateTimeOffset implicitly 
gives the correct answer. Similarly, a hacker plotting a Distributed Denial of Service 
attack would reach for a DateTimeOffset! To do the same with DateTime requires 
standardizing on a single time zone (typically UTC) throughout your application. 
This is problematic for two reasons: 


¢ To be friendly to the end user, UTC DateTimes require explicit conversion to 
local time prior to formatting. 


e It’s easy to forget and incorporate a local DateTime. 


DateTime is better, though, at specifying a value relative to the local computer at 
runtime—for example, if you want to schedule an archive at each of your interna- 
tional offices for next Sunday, at 3 A.M. local time (when there’s least activity). Here, 
DateTime would be more suitable because it would respect each site’s local time. 


Internally, DateTimeOffset uses a short integer to store the 
UTC offset in minutes. It doesn’t store any regional informa- 
tion, so there's nothing present to indicate whether an offset of 
+08:00, for instance, refers to Singapore time or Perth time. 


We revisit time zones and equality comparison in more depth in “Dates and Time 
Zones” on page 264. 


SQL Server 2008 introduced direct support for Date 
TimeOffset through a new data type of the same name. 


Constructing a DateTime 


DateTime defines constructors that accept integers for the year, month, and day— 
and optionally, the hour, minute, second, and millisecond: 


public DateTime (int year, int month, int day); 


public DateTime (int year, int month, int day, 
int hour, int minute, int second, int millisecond); 
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If you specify only a date, the time is implicitly set to midnight (0:00). 


The DateTime constructors also allow you to specify a DateTimeKind—an enum 
with the following values: 


Unspecified, Local, Utc 


This corresponds to the three-state flag described in the preceding section. 
Unspecified is the default, and it means that the DateTime is time-zone-agnostic. 
Local means relative to the local time zone on the current computer. A local Date 
Time does not include information about which particular time zone it refers to, nor, 
unlike DateTimeOffset, the numeric offset from UTC. 


A DateTime’s Kind property returns its DateTimeKind. 


DateTime’s constructors are also overloaded to accept a Calendar object as well. 
This allows you to specify a date using any of the Calendar subclasses defined in 
System.Globalization: 


DateTime d = new DateTime (5767, 1, 1, 
new System.Globalization.HebrewCalendar()); 


Console.WriteLine (d); // 12/12/2006 12:00:00 AM 


(The formatting of the date in this example depends on your computer’s control 
panel settings.) A DateTime always uses the default Gregorian calendar—this exam- 
ple, a one-time conversion, takes place during construction. To perform computa- 
tions using another calendar, you must use the methods on the Calendar subclass 
itself. 


You can also construct a DateTime with a single ticks value of type Long, where ticks 
is the number of 100-ns intervals from midnight 01/01/0001. 


For interoperability, DateTime provides the static FromFileTime and FromFile 
TimeUtc methods for converting from a Windows file time (specified as a Long) and 
FromOADate for converting from an OLE automation date/time (specified as a 
double). 


To construct a DateTime from a string, call the static Parse or ParseExact method. 
Both methods accept optional flags and format providers; ParseExact also accepts a 
format string. We discuss parsing in greater detail in “Formatting and Parsing” on 
page 270. 


Constructing a DateTimeOffset 


DateTimeOffset has a similar set of constructors. The difference is that you also 
specify a UTC offset as a TimeSpan: 


public DateTimeOffset (int year, int month, int day, 
int hour, int minute, int second, 
TimeSpan offset); 
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public DateTimeOffset (int year, int month, int day, 
int hour, int minute, int second, int millisecond, 
TimeSpan offset); 


The TimeSpan must amount to a whole number of minutes; otherwise an exception 
is thrown. 


DateTimeOffset also has constructors that accept a Calendar object, a long ticks 
value, and static Parse and ParseExact methods that accept a string. 


You can construct a DateTimeOffset from an existing DateTime either by using 
these constructors: 


public DateTimeOffset (DateTime dateTime); 
public DateTimeOffset (DateTime dateTime, TimeSpan offset); 


or with an implicit cast: 
DateTimeOffset dt = new DateTime (2000, 2, 3); 


The implicit cast from DateTime to DateTimeOffset is handy 
because most of the .NET Framework supports DateTime— 
not DateTimeOffset. 


If you dont specify an offset, it’s inferred from the DateTime value using these rules: 


e Ifthe DateTime has a DateTimeKind of Utc, the offset is zero. 


e Ifthe DateTime has a DateTimeKind of Local or Unspecified (the default), the 
offset is taken from the current local time zone. 


To convert in the other direction, DateTimeOffset provides three properties that 
return values of type DateTime: 


e The UtcDateTime property returns a DateTime in UTC time. 
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e The LocalDateTime property returns a DateTime in the current local time zone 
(converting it if necessary). 





e The DateTime property returns a DateTime in whatever zone it was specified, 
with a Kind of Unspecified (ie., it returns the UTC time plus the offset). 


The current DateTime/DateTimeOffset 


Both DateTime and DateTimeOffset have a static Now property that returns the cur- 
rent date and time: 


Console.WriteLine (DateTime.Now); // 11/11/2019 1:23:45 PM 
Console.WriteLine (DateTimeOffset.Now) ; // 11/11/2019 1:23:45 PM -06:00 


DateTime also provides a Today property that returns just the date portion: 


Console.WriteLine (DateTime. Today); // 11/11/2019 12:00:00 AM 
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The static UtcNow property returns the current date and time in UTC: 


Console.WriteLine (DateTime.UtcNow) ; // 11/11/2019 7:23:45 AM 
Console.WriteLine (DateTimeOffset.UtcNow); // 11/11/2019 7:23:45 AM +00:00 


The precision of all these methods depends on the operating system and is typically 
in the 10 to 20 ms region. 


Working with dates and times 


DateTime and DateTimeOffset provide a similar set of instance properties that 
return various date/time elements: 


DateTime dt = new DateTime (2000, 2, 3, 
10, 20, 30); 


Console.WriteLine (dt.Year); // 2000 
Console.WriteLine (dt.Month); // 2 
Console.WriteLine (dt.Day); // 3 


Console.WriteLine (dt.DayOfWeek) ; // Thursday 
Console.WriteLine (dt.DayOfYear); // 34 


Console.WriteLine (dt.Hour); // 10 
Console.WriteLine (dt.Minute); // 20 
Console.WriteLine (dt.Second); // 30 
Console.WriteLine (dt.Millisecond); // 0 
Console.WriteLine (dt.Ticks); // 630851700300000000 


Console.WriteLine (dt.TimeOfDay) ; // 10:20:30 (returns a TimeSpan) 
DateTimeOffset also has an Offset property of type TimeSpan. 


Both types provide the following instance methods to perform computations (most 
accept an argument of type double or int): 


AddYears AddMonths AddDays 
AddHours AddMinutes AddSeconds AddMilliseconds AddTicks 


These all return a new DateTime or DateTimeOffset, and they take into account 
such things as leap years. You can pass in a negative value to subtract. 


The Add method adds a TimeSpan to a DateTime or DateTimeOffset. The + operator 
is overloaded to do the same job: 


TimeSpan ts = TimeSpan.FromMinutes (90); 
Console.WriteLine (dt.Add (ts)); 
Console.WriteLine (dt + ts); // same as above 


You can also subtract a TimeSpan from a DateTime/DateTimeOffset and subtract 
one DateTime/DateTimeOffset from another. The latter gives you a TimeSpan: 


DateTime thisYear = new DateTime (2015, 1, 1); 
DateTime nextYear = thisYear.AddYears (1); 
TimeSpan oneYear = nextYear - thisYear; 
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Formatting and parsing DateTimes 


Calling ToString on a DateTime formats the result as a short date (all numbers) fol- 
lowed by a long time (including seconds); for example: 


11/11/2019 11:50:30 AM 


The operating system's control panel, by default, determines such things as whether 
the day, month, or year comes first, the use of leading zeros, and whether 12- or 24- 
hour time is used. 


Calling ToString on a DateTimeOffset is the same, except that the offset is also 
returned: 


11/11/2019 11:50:30 AM -06:00 


The ToShortDateString and ToLongDateString methods return just the date por- 
tion. The long date format is also determined by the control panel; an example is 
“Wednesday, 11 November 2015”. ToShortTimeString and ToLongTimeString 
return just the time portion, such as 17:10:10 (the former excludes seconds). 


These four just-described methods are actually shortcuts to four different format 
strings. ToString is overloaded to accept a format string and provider, allowing you 
to specify a wide range of options and control how regional settings are applied. We 
describe this in “Formatting and Parsing” on page 270. 


DateTimes and DateTimeOffsets can be misparsed if the cul- 
ture settings differ from those in force when formatting takes 
place. You can avoid this problem by using ToString in con- 
junction with a format string that ignores culture settings 


« » 


(such as “o”): 


DateTime dt1 = DateTime.Now; 

string cannotBeMisparsed = dt1.ToString ("o"); 

DateTime dt2 = DateTime.Parse (cannotBeMisparsed); 
The static Parse/TryParse and ParseExact/TryParseExact methods do the reverse 
of ToString, converting a string to a DateTime or DateTimeOffset. These methods 
are also overloaded to accept a format provider. The Try* methods return false 
instead of throwing a FormatException. 


Null DateTime and DateTimeOffset values 
Because DateTime and DateTimeOffset are structs, they are not intrinsically nulla- 
ble. When you need nullability, there are two ways around this: 

e Usea Nullable type (ie., DateTime? or DateTimeOffset?) 


e Use the static field DateTime.MinValue or DateTimeOffset.MinValue (the 
default values for these types) 
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A nullable type is usually the best approach because the compiler helps to prevent 
mistakes. DateTime.MinValue is useful for backward compatibility with code writ- 
ten prior to C# 2.0 (when nullable value types were introduced). 


Calling ToUniversalTime or ToLocalTime on a 
DateTime.MinValue can result in it no longer being Date 
Time.MinValue (depending on which side of GMT you are 
on). If youre right on GMT (England, outside daylight sav- 
ing), the problem wont arise at all because local and UTC 
times are the same. This is your compensation for the English 
winter! 


Dates and Time Zones 


In this section, we examine in more detail how time zones influence DateTime and 
DateTimeOffset. We also look at the TimeZone and TimeZoneInfo types, which pro- 
vide information on time zone offsets and daylight saving time. 


DateTime and Time Zones 


DateTime is simplistic in its handling of time zones. Internally, it stores a DateTime 
using two pieces of information: 


e A 62-bit number, indicating the number of ticks since 1/1/0001 


¢ A 2-bit enum, indicating the DateTimeKind (Unspecified, Local, or Utc) 


When you compare two DateTime instances, only their ticks values are compared; 
their DateTimeKinds are ignored: 


DateTime dt1 = new DateTime (2000, 1, 1, 10, 20, 30, DateTimeKind.Local); 
DateTime dt2 = new DateTime (2000, 1, 1, 10, 20, 30, DateTimeKind.Utc); 
Console.WriteLine (dt1 == dt2); // True 

DateTime local = DateTime.Now; 

DateTime utc = local.ToUniversalTime(); 

Console.WriteLine (local == utc); // False 


The instance methods ToUniversalTime/ToLocalTime convert to universal/local 
time. These apply the computer’s current time zone settings and return a new 
DateTime with a DateTimeKind of Utc or Local. No conversion happens if you call 
ToUniversalTime on a DateTime that’s already Utc, or ToLocalTime on a DateTime 
that’s already Local. You will get a conversion, however, if you call ToUniversal 
Time or ToLocalTime on a DateTime that’s Unspecified. 


You can construct a DateTime that differs from another only in Kind with the static 
DateTime. SpecifyKind method: 


DateTime d = new DateTime (2015, 12, 12); // Unspecified 
DateTime utc = DateTime.SpecifyKind (d, DateTimeKind.Utc); 
Console.WriteLine (utc); // 12/12/2015 12:00:00 AM 
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DateTimeOffset and Time Zones 


Internally, DateTimeOffset comprises a DateTime field whose value is always in 
UTC, and a 16-bit integer field for the UTC offset in minutes. Comparisons look 
only at the (UTC) DateTime; the Offset is used primarily for formatting. 


The ToUniversalTime/ToLocalTime methods return a DateTimeOffset represent- 
ing the same point in time but with a UTC or local offset. Unlike with DateTime, 
these methods dont affect the underlying date/time value, only the offset: 


DateTimeOffset local = DateTimeOffset.Now; 
DateTimeOffset utc = local. ToUniversalTime(); 


Console.WriteLine (local.Offset); // -©6:00:00 (in Central America) 
Console.WriteLine (utc.O0ffset); // 90:00:00 


Console.WriteLine (local == utc); // True 
To include the Offset in the comparison, you must use the EqualsExact method: 


Console.WriteLine (local.EqualsExact (utc)); // False 


TimeZone and TimeZonelnfo 


The TimeZone and TimeZoneInfo classes provide information on time zone names, 
UTC offsets, and daylight saving time rules. TimeZoneInfo is the more powerful of 
the two. 


The biggest difference between the two types is that TimeZone lets you access only 
the current local time zone, whereas TimeZoneInfo provides access to all the world’s 
time zones. Further, TimeZoneInfo exposes a richer (although at times, more awk- 
ward) rules-based model for describing daylight saving time. 


TimeZone 
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The static TimeZone.CurrentTimeZone method returns a TimeZone object based on 
the current local settings. The following demonstrates the result if run in California: 





TimeZone zone = TimeZone.CurrentTimeZone; 
Console.WriteLine (zone.StandardName) ; // Pacific Standard Time 
Console.WriteLine (zone.DaylightNanme) ; // Pacific Daylight Time 


The IsDaylightSavingTime and GetUtcOffset methods work as follows: 


DateTime dt1 = new DateTime (2019, 1, 1); 
DateTime dt2 = new DateTime (2019, 6, 1); 


Console.WriteLine (zone.IsDaylightSavingTime (dt1)); // True 
Console.WriteLine (zone.IsDaylightSavingTime (dt2)); // False 
Console.WriteLine (zone.GetUtcOffset (dt1)); // -08:00:00 
Console.WriteLine (zone.GetUtcOffset (dt2)); // -07:00:00 


The GetDaylightChanges method returns specific daylight saving time information 
for a given year: 
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DaylightTime day = zone.GetDaylightChanges (2019); 


Console.WriteLine (day.Start.ToString ("M")); // 10 March 
Console.WriteLine (day.End.ToString ("M")); // 93 November 
Console.WriteLine (day.Delta); // 91:00:00 


TimeZonelnfo 


The TimeZoneInfo class works in a similar manner. TimeZoneInfo.Local returns 
the current local time zone: 


TimeZoneInfo zone = TimeZoneInfo.Local; 
Console.WriteLine (zone.StandardName) ; // Pacific Standard Time 
Console.WriteLine (zone.DaylightName) ; // Pacific Daylight Time 


TimeZoneInfo also provides IsDaylightSavingTime and GetUtcOffset methods— 
the difference is that they accept either a DateTime or a DateTimeOffset. 


You can obtain a TimeZoneInfo for any of the world’s time zones by calling Find 
SystemTimeZoneById with the zone ID. This feature is unique to TimeZoneInfo, as 
is everything else that we demonstrate from this point on. We'll switch to Western 
Australia for reasons that will soon become clear: 


TimeZoneInfo wa = TimeZoneInfo.FindSystemTimeZoneBylId 
("W. Australia Standard Time"); 


Console.WriteLine (wa.Id); // W. Australia Standard Time 
Console.WriteLine (wa.DisplayName) ; // (GMT+08:00) Perth 
Console.WriteLine (wa.BaseUtcOffset); // 98:00:00 
Console.WriteLine (wa.SupportsDaylightSavingTime) ; // True 


The Id property corresponds to the value passed to FindSystemTimeZoneById. The 
static GetSystemTimeZones method returns all world time zones; hence, you can list 
all valid zone ID strings as follows: 


foreach (TimeZoneInfo z in TimeZoneInfo.GetSystemTimeZones()) 
Console.WriteLine (z.Id); 


You can also create a custom time zone by calling TimeZone 
Info.CreateCustomTimeZone. Because TimeZoneInfo is 
immutable, you must pass in all the relevant data as method 
arguments. 


You can serialize a predefined or custom time zone to a (semi) 
human-readable string by calling ToSerializedString—and 
deserialize it by calling = TimeZoneInfo.From 
SerializedString. 


The static ConvertTime method converts a DateTime or DateTimeOffset from one 
time zone to another. You can include either just a destination TimeZoneInfo, or 
both source and destination TimeZoneInfo objects. You can also convert directly 
from or to UTC with the methods ConvertTimeFromUtc and ConvertTimeToUtc. 
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For working with daylight saving time, TimeZoneInfo provides the following addi- 
tional methods: 


e IsInvalidTime returns true if a DateTime is within the hour (or delta) that’s 
skipped when the clocks move forward. 


e IsAmbiguousTime returns true if a DateTime or DateTimeOffset is within the 
hour (or delta) that’s repeated when the clocks move back. 


¢ GetAmbiguousTimeOffsets returns an array of TimeSpans representing the 
valid offset choices for an ambiguous DateTime or DateTimeOffset. 


Unlike with TimeZone, you can't obtain simple dates from a TimeZoneInfo indicat- 
ing the start and end of daylight saving time. Instead, you must call Get 
AdjustmentRules, which returns a declarative summary of all daylight saving rules 
that apply to all years. Each rule has a DateStart and DateEnd indicating the date 
range within which the rule is valid: 


foreach (TimeZoneInfo.AdjustmentRule rule in wa.GetAdjustmentRules()) 
Console.WriteLine ("Rule: applies from " + rule.DateStart + 
"to " + rule.DateEnd); 


Western Australia first introduced daylight saving time in 2006, midseason (and 
then rescinded it in 2009). This required a special rule for the first year; hence, there 
are two rules: 


Rule: applies from 1/01/2006 12:00:00 AM to 31/12/2006 12:00:00 AM 
Rule: applies from 1/01/2007 12:00:00 AM to 31/12/2009 12:00:00 AM 


Each AdjustmentRule has a DaylightDelta property of type TimeSpan (this is one 
hour in almost every case) and properties called DaylightTransitionStart and 
DaylightTransitionEnd. The latter two are of type TimeZoneInfo 

. TransitionTime, which has the following properties: 


public bool IsFixedDateRule { get; } 
public DayOfWeek DayOfWeek { get; } 
public int Week { get; } 

public int Day { get; } 

public int Month { get; } 

public DateTime TimeOfDay { get; } 


A transition time is somewhat complicated in that it needs to represent both fixed 
and floating dates. An example of a floating date is “the last Sunday in March.” Here 
are the rules for interpreting a transition time: 


1. If, for an end transition, IsFixedDateRule is true, Day is 1, Month is 1, and 
TimeOfDay is DateTime.MinValue, there is no end to daylight saving time in 
that year (this can happen only in the southern hemisphere, upon the initial 
introduction of daylight saving time to a region). 


2. Otherwise, if IsFixedDateRule is true, the Month, Day, and TimeOfDay proper- 
ties determine the start or end of the adjustment rule. 
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3. Otherwise, if IsFixedDateRule is false, the Month, DayOfWeek, Week, and 
TimeOfDay properties determine the start or end of the adjustment rule. 


In the last case, Week refers to the week of the month, with “5” meaning the last 
week. We can demonstrate this by enumerating the adjustment rules for our wa time 
zone: 


foreach (TimeZoneInfo.AdjustmentRule rule in wa.GetAdjustmentRules()) 


{ 
Console.WriteLine ("Rule: applies from " + rule.DateStart + 
"to " + rule.DateEnd); 


Console.WriteLine ("| Delta: " + rule.DaylightDelta); 


Console.WriteLine ("| Start: " + FormatTransitionTime 
(rule.DaylightTransitionStart, false)); 
"+ FormatTransittonTime 
(rule.DaylightTransitionEnd, true)); 


Console.WriteLine (" End: 


Console.WriteLine(); 


} 


In FormatTransitionTime, we honor the rules just described: 


static string FormatTransitionTime (TimeZoneInfo.TransitionTime tt, 
bool endTime) 
{ 
if (endTime && tt.IsFixedDateRule 
&& tt.Day == 1 && tt.Month == 
&& tt.TimeOfDay == DateTime.MinValue) 


Wh 
2 


return 


string s; 
if (tt. IsFixedDateRule) 
s = tt.Day.ToString(); 


else 
s = "The " + 
"first second third fourth last".Split() [tt.Week - 1] + 
"" + tt.DayOfWeek + " in"; 
return s + " " + DateTimeFormatInfo.CurrentInfo.MonthNames [tt.Month-1] 


+" at " + tt.TimeOfDay.TimeOfDay; 
} 


Daylight Saving Time and DateTime 


If you use a DateTimeOffset or a UTC DateTime, equality comparisons are unimpe- 
ded by the effects of daylight saving time. But with local DateTimes, daylight saving 
can be problematic. 


We can summarize the rules as follows: 


e Daylight saving affects local time but not UTC time. 
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e When the clocks turn back, comparisons that rely on time moving forward will 
break if (and only if) they use local DateTimes. 


e You can always reliably round-trip between UTC and local times (on the same 
computer)—even as the clocks turn back. 


The IsDaylightSavingTinme tells you whether a given local DateTime is subject to 
daylight saving time. UTC times always return false: 


Console.Write (DateTime.Now.IsDaylightSavingTime()); // True or false 
Console.Write (DateTime.UtcNow.IsDaylightSavingTime()); // Always false 


Assuming dto is a DateTimeOf fset, the following expression does the same: 
dto.LocalDateTime. IsDaylightSavingTime 


The end of daylight saving time presents a particular complication for algorithms 
that use local time. When the clocks go back, the same hour (or more precisely, 
Delta) repeats itself. We can demonstrate this by instantiating a DateTime right in 
the “twilight zone” on your computer, and then subtracting Delta (this example 
requires that you practice daylight saving time to be interesting!): 


DaylightTime changes = TimeZone.CurrentTimeZone.GetDaylightChanges (2010); 
TimeSpan halfDelta = new TimeSpan (changes.Delta.Ticks / 2); 

DateTime utc1 = changes.End.ToUniversalTime() - halfDelta; 

DateTime utc2 = utc1 - changes.Delta; 


Converting these variables to local times demonstrates why you should use UTC 
and not local time if your code relies on time moving forward: 


DateTime loci = utc1.ToLocalTime(); // (Pacific Standard Time) 
DateTime loc2 = utc2.ToLocalTime(); 

Console.WriteLine (loc1); // 2/11/2010 1:30:00 AM 
Console.WriteLine (loc2); // 2/11/2010 1:30:00 AM 
Console.WriteLine (loc1 == loc2); // True 


Despite loc1 and loc2 reporting as equal, they are different inside. DateTime 
reserves a special bit for indicating on which side of the twilight zone an ambiguous 
local date lies! This bit is ignored in comparison—as we just saw—but comes into 
play when you format the DateTime unambiguously: 


Console.Write (loc1.ToString ("0")); // 2010-11-02T02:30:00.0000000-08:00 
Console.Write (loc2.ToString ("0")); // 2010-11-02T02:30:00.0000000-07:00 


This bit also is read when you convert back to UTC, ensuring perfect round- 
tripping between local and UTC times: 


Console.WriteLine (loc1.ToUniversalTime() == utc1); // True 
Console.WriteLine (loc2.ToUniversalTime() == utc2); // True 


You can reliably compare any two DateTimes by first calling 
ToUniversalTime on each. This strategy fails if (and only if) 
exactly one of them has a DateTimeKind of Unspecified. This 
potential for failure is another reason for favoring 
DateTimeOffset. 
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Formatting and Parsing 


Formatting means converting fo a string; parsing means converting from a string. 
The need to format or parse arises frequently in programming, in a variety of situa- 
tions. Hence, .NET Core provides a variety of mechanisms: 


ToString and Parse 
These methods provide default functionality for many types. 


Format providers 
These manifest as additional ToString (and Parse) methods that accept a for- 
mat string and/or a format provider. Format providers are highly flexible and 
culture-aware. .NET Core includes format providers for the numeric types and 
DateTime/DateTimeOffset. 


XmLConvert 
This is a static class with methods that format and parse while honoring XML 
standards. XmlConvert is also useful for general-purpose conversion when you 
need culture independence or you want to preempt misparsing. XmLConvert 
supports the numeric types, bool, DateTime, DateTimeOffset, TimeSpan, and 
Guid. 


Type converters 
These target designers and XAML parsers. 


In this section, we discuss the first two mechanisms, focusing particularly on format 
providers. We then describe XmlConvert, type converters, and other conversion 
mechanisms. 


ToString and Parse 


The simplest formatting mechanism is the ToString method. It gives meaningful 
output on all simple value types (bool, DateTime, DateTimeOffset, TimeSpan, Guid, 
and all the numeric types). For the reverse operation, each of these types defines a 
static Parse method: 


true.ToString(); // s = "True" 


string s = 
= bool.Parse (s); // b = true 


bool b 
If the parsing fails, a FormatException is thrown. Many types also define a 


TryParse method, which returns false if the conversion fails rather than throwing 
an exception: 


bool failure = int.TryParse ("qwerty", out int i1); 
bool success = int.TryParse ("123", out int i2); 


If you don't care about the output and want to test only whether parsing would suc- 
ceed, you can use a discard: 


bool success = int.TryParse ("123", out int _); 
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If you anticipate an error, calling TryParse is faster and more elegant than calling 
Parse in an exception handling block. 


The Parse and TryParse methods on DateTime(Offset) and the numeric types 
respect local culture settings; you can change this by specifying a CultureInfo 
object. Specifying invariant culture is often a good idea. For instance, parsing 
“1.234” into a double gives us 1234 in Germany: 


Console.WriteLine (double.Parse ("1.234")); // 1234 (in Germany) 


This is because in Germany, the period indicates a thousands separator rather than a 
decimal point. Specifying invariant culture fixes this: 


double x = double.Parse ("1.234", CultureInfo.InvariantCulture) ; 
The same applies when calling ToString(): 


string x = 1.234.ToString (CultureInfo.InvariantCulture) ; 


Format Providers 


Sometimes, you need more control over how formatting and parsing take place. 
There are dozens of ways to format a DateTime(Offset), for instance. Format pro- 
viders allow extensive control over formatting and parsing, and are supported for 
numeric types and date/times. Format providers are also used by user interface con- 
trols for formatting and parsing. 


The gateway to using a format provider is IFormattable. All numeric types—and 
DateTime(Offset )—implement this interface: 


public interface IFormattable 


{ 


string ToString (string format, IFormatProvider formatProvider); 


} 


The first argument is the format string; the second is the format provider. The format 
string provides instructions; the format provider determines how the instructions 
are translated. For example: 


NumberFormatInfo f = new NumberFormatInfo(); 
f.CurrencySymbol = "$$"; 
Console.WriteLine (3.ToString ("C", f)); // $$ 3.00 


Here, "C" is a format string that indicates currency, and the NumberFormatInfo 
object is a format provider that determines how currency—and other numeric rep- 
resentations—are rendered. This mechanism allows for globalization. 


All format strings for numbers and dates are listed in “Stan- 
dard Format Strings and Parsing Flags” on page 275. 


If you specify a null format string or provider, a default is applied. The default for- 
mat provider is CultureInfo.CurrentCulture, which, unless reassigned, reflects 
the computer’s runtime control panel settings. For example, on this computer: 
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Console.WriteLine (10.3.ToString ("C", null)); // $10.30 


For convenience, most types overload ToString such that you can omit a null 
provider: 

Console.WriteLine (10.3.ToString ("C")); // $10.30 

Console.WriteLine (10.3.ToString ("F4")); // 10.3000 (Fix to 4 D.P.) 


Calling ToString on a DateTime(Offset) or a numeric type with no arguments is 
equivalent to using a default format provider, with an empty format string. 


.NET Core defines three format providers (all of which implement IFormat 
Provider): 


NumberFormatInfo 
DateTimeFormatiInfo 
CultureInfo 


All enum types are also formattable, though there's no special 
IFormatProvider class. 


Format providers and Culturelnfo 


Within the context of format providers, CultureInfo acts as an indirection mecha- 
nism for the other two format providers, returning a NumberFormatInfo or 
DateTimeFormatInfo object applicable to the culture’s regional settings. 


In the following example, we request a specific culture (english language in Great 
Britain): 


CultureInfo uk = CultureInfo.GetCultureInfo ("en-GB"); 
Console.WriteLine (3.ToString ("C", uk)); // £3.00 


This executes using the default NumberFormatInfo object applicable to the en-GB 
culture. 


The next example formats a DateTime with invariant culture. Invariant culture is 
always the same, regardless of the computer’s settings: 


DateTime dt = new DateTime (2000, 1, 2); 

CultureInfo iv = CultureInfo.InvariantCulture; 

Console.WriteLine (dt.ToString (iv)); // 01/02/2000 00:00:00 
Console.WriteLine (dt.ToString ("d", iv)); // 91/02/2000 


Invariant culture is based on American culture, with the fol- 
lowing differences: 
¢ The currency symbol is # instead of $. 


e Dates and times are formatted with leading zeros 
(though still with the month first). 


e Time uses the 24-hour format rather than an AM/PM 
designator. 
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Using NumberFormatInfo or DateTimeFormatinfo 


In the next example, we instantiate a NumberFormatInfo and change the group sepa- 
rator from a comma to a space. We then use it to format a number to three decimal 
places. 


NumberFormatInfo f = new NumberFormatinfo (); 

f.NumberGroupSeparator = " "; 

Console.WriteLine (12345.6789.ToString ("N3", f)); // 12 345.679 
The initial settings for a NumberFormatInfo or DateTimeFormatiInfo are based on 
the invariant culture. Sometimes, however, it’s more useful to choose a different 
starting point. To do this, you can Clone an existing format provider: 


NumberFormatInfo f = (NumberFormatInfo) 
CultureInfo.CurrentCulture.NumberFormat.Clone(); 


A cloned format provider is always writable—even if the original was read-only. 


Composite formatting 


Composite format strings allow you to combine variable substitution with format 
strings. The static string.Format method accepts a composite format string (we 
illustrated this in “string.Format and composite format strings” on page 248): 


string composite = "Credit={0:C}"; 
Console.WriteLine (string.Format (composite, 500)); // Credit=$500.00 


The Console class itself overloads its Write and WriteLine methods to accept com- 
posite format strings, allowing us to shorten this example slightly: 


Console.WriteLine ("Credit={0:C}", 500); // Credit=$500.00 


You can also append a composite format string to a StringBuilder (via Append 
Format), and to a TextWriter for I/O (see Chapter 15). 


string.Format accepts an optional format provider. A simple application for this is 
to call ToString on an arbitrary object while passing in a format provider: 


string s = string.Format (CultureInfo.InvariantCulture, "{0}", someObject); 
This is equivalent to the following: 


string s; 
if (someObject is IFormattable) 
s = ((IFormattable)someObject).ToString (null, 
CultureInfo.InvariantCulture) ; 
else if (someObject == null) 
sa"; 
else 
s = someObject.ToString(); 





Formatting and Parsing | 273 


= 
< 
3 
ror 
7) 
3 
0) 
3 
er 
a 
2) 


| 
= 
9 
3 
oO 
= 
° 
= 
x 





Parsing with format providers 


There’s no standard interface for parsing through a format provider. Instead, each 
participating type overloads its static Parse (and TryParse) method to accept a for- 
mat provider, and optionally, a NumberStyles or DateTimeStyles enum. 


NumberStyles and DateTimeStyles control how parsing work: they let you specify 
such things as whether parentheses or a currency symbol can appear in the input 
string. (By default, the answer to both questions is no.) For example: 


int error = int.Parse ("(2)"); // Exception thrown 


int minusTwo = int.Parse ("(2)", NumberStyles.Integer | 
NumberStyles.ALlowParentheses) ; // OK 


decimal fivePointTwo = decimal.Parse ("£5.20", NumberStyles.Currency, 
CultureInfo.GetCultureInfo ("en-GB")); 


The next section lists all NumberStyles and DateTimeStyles members as well as the 
default parsing rules for each type. 


IFormatProvider and ICustomFormatter 
All format providers implement IFormatProvider: 
public interface IFormatProvider { object GetFormat (Type formatType); } 


The purpose of this method is to provide indirection—this is what allows Culture 
Info to defer to an appropriate NumberFormatInfo or DateTimeInfo object to do the 
work. 


By implementing IFormatProvider—along with ICustomFormatter—you can also 
write your own format provider that works in conjunction with existing types. 
ICustomFormatter defines a single method, as follows: 


string Format (string format, object arg, IFormatProvider formatProvider); 
The following custom format provider writes numbers as words: 


public class WordyFormatProvider : IFormatProvider, ICustomFormatter 


{ 
static readonly string[] _numberWords = 
"Zero one two three four five six seven eight nine minus point".Split(); 


IFormatProvider _parent; // Allows consumers to chain format providers 


public WordyFormatProvider () : this (CultureInfo.CurrentCulture) { } 
public WordyFormatProvider (IFormatProvider parent) => _parent = parent; 


public object GetFormat (Type formatType) 


{ 
if (formatType == typeof (ICustomFormatter)) return this; 
return null; 


} 
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public string Format (string format, object arg, IFormatProvider prov) 
{ 
// If it's not our format string, defer to the parent provider: 
if (arg == null || format != "W") 
return string.Format (_parent, "{0:" + format + "}", arg); 


StringBuilder result = new StringBuilder(); 
string digitList = string.Format (CultureInfo.InvariantCulture, 


"{O}", arg); 
foreach (char digit in digitList) 
{ 
int i = "0123456789-.".IndexOf (digit); 
if (i == -1) continue; 


if (result.Length > 0) result.Append (' '); 
result.Append (_numberWords[i]); 


} 


return result. ToString(); 
} 
} 
Notice that in the Format method, we used string.Format—with Invariant 
Culture—to convert the input number to a string. It would have been much simpler 
just to call ToString() on arg, but then CurrentCulture would have been used 
instead. The reason for needing the invariant culture is evident a few lines later: 


int i = "0123456789-.".IndexOf (digit); 


It’s critical here that the number string comprises only the characters 0123456789-. 
and not any internationalized versions of these. 


Here's an example of using WordyFormatProvider: 


double n = -123.45; 
IFormatProvider fp = new WordyFormatProvider(); 
Console.WriteLine (string.Format (fp, "{@:C} in words is {O:W}", n)); 
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You can use custom format providers only in composite format strings. 


Standard Format Strings and Parsing Flags 


The standard format strings control how a numeric type or DateTime/DateTime 
Offset is converted to a string. There are two kinds of format strings: 


Standard format strings 
With these, you provide general guidance. A standard format string consists of 
a single letter, followed, optionally, by a digit (whose meaning depends on the 
letter). An example is "C" or "F2". 


Custom format strings 
With these, you micromanage every character with a template. An example is 


"@:#.000E+00". 
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Custom format strings are unrelated to custom format providers. 


Numeric Format Strings 


Table 6-2 lists all standard numeric format strings. 


Table 6-2. Standard numeric format strings 








Letter Meaning Sample input Result Notes 
Gorg “General” 1.2345, "G" 1.2345 Switches to exponential notation 
0.00001, "G" 1E-05 for small or large numbers. 
0.00001, "g" 1e-05 G3 limits precision to three digits 
1.2345, "G3" 41223 in total (before + after point). 
12345, "G3" 1.23E04 
F Fixed point 2345.678, "F2" 2345.68 F2 rounds to two decimal places. 
2345.6, "F2" 2345.60 
N Fixed point with 2345.678, "N2" 2,345.68 As above, with group (1,000s) 
group separator 2345.6, "N2" 2,345.60 separator (details from format 
(“Numeric”) provider). 
D Pad with 123, "D5" 00123 For integral types only. 
leading zeros 123, "D1" 123 D5 pads left to five digits; does 
not truncate. 
Eore Force 56789, "E" 5.678900E+004  Six-digit default precision. 
exponential 56789, "e" 5.678900e+004 
notation 56789, "E2" 5.68E+004 
C Currency 432. %C" $1.20 C with no digit uses default 
4.25 "C4" $1.2000 number of D.P. from format 
provider. 
P Percent .503, "P" 50.30% Uses symbol and layout from 
.503, "PO" 50% format provider. 
Decimal places can optionally be 
overridden. 
X orx Hexadecimal 47, "x" 2F X for uppercase hex digits; x for 
47. "x" 2f lowercase hex digits. 
47, "x4" 002F Integrals only. 
Ror Round-trip 1f / 3f, "R" 0.333333343 Use R for BigInteger, G17 for 
G9/G17 double, or G9 for float. 
The automatic rounding just described is usually beneficial 
and goes unnoticed. However, it can cause trouble if you need 
to round-trip a number; in other words, convert it to a string 
and back again (maybe repeatedly) while preserving value 
equality. For this reason, the R, G17, and G9 format strings exist 
to circumvent this implicit rounding. 
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Supplying no numeric format string (or a null or blank string) is equivalent to using 
the "G" standard format string followed by no digit. This exhibits the following 
behavior: 


e Numbers smaller than 10~ or larger than the type’s precision are expressed in 
exponential (scientific) notation. 


e The two decimal places at the limit of float or double’s precision are rounded 
away to mask the inaccuracies inherent in conversion to decimal from their 
underlying binary form. 


Table 6-3 lists custom numeric format strings. 


Table 6-3. Custom numeric format strings 





Specifier ETT Sample input Result Notes 
# Digit 12.345, ".##" 12.35 Limits digits after D.P. 
placeholder 12.345, ".####" 12.345 
0 Zero 12.345, ".00" 12.35 As above, but also pads with zeros 
placeholder 12.345, ".0000" 12.3450 before and after D.P. 
99, "000.00" 099.00 
Decimal point Indicates D.P. 
Actual symbol comes from 
NumberFormatInfo. 
7 Group 1234, 1,234 Symbol comes from 
separator "dt HE HA" NumberFormatInfo. 
1234, 0,001,234 
"9,000,000" = 7 
a . : 2 
P Multiplier 1000000, "#," 1000 If comma is at end or before D.P., it = 3 
(as above) 1000000, "#,, 1 acts as a multiplier—dividing 2 2 
result by 1,000, 1,000,000, etc. Fs g 
% Percent 0.6, "00%" 60% First multiplies by 100 and then an 
notation substitutes percent symbol 
obtained from 
NumberFormatInfo. 
EO, ed, Exponent 1234, "OEO" 1E3 
E+0, e+0 Notation 1234, "OE+0" 1E+3 
E-0, e-0 1234, "0.Q0E00" 1.23E03 
1234, "0.00e00" 1.23e03 
\ Literal 50, @"\#0" #50 Use in conjunction with an @ prefix 
character on the string—or use \\ 
quote 
"xx! 'xx! Literal string 50, "O '...'" 50... 
quote 
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Specifier Meaning Sample input Result Notes 





: Section 15; 15 (If positive) 
separator "#3 (#);zero" (5) (If negative) 

5, zero (If zero) 
"#; (#)3zero" 
0, "#;(#);zero" 

Any other Literal 35.2, "SO. $35 . 20c 

char 0c" 

NumberStyles 


Each numeric type defines a static Parse method that accepts a NumberStyles argu- 
ment. NumberStyles is a flags enum that lets you determine how the string is read as 
it’s converted to a numeric type. It has the following combinable members: 


AllowLeadingwWhite AllowTrailingWhite 


AllowLeadingSign AllowTrailingSign 
AllowParentheses AllowDecimalPoint 
AlLlLowThousands ALLowExponent 


AllowCurrencySymbol AllowHexSpecifier 
Number Styles also defines these composite members: 
None Integer Float Number HexNumber Currency Any 


Except for None, all composite values include AllowLeadingWhite and Allow 
TrailingWhite. Figure 6-1 shows their remaining makeup, with the most useful 
three emphasized. 














Figure 6-1. Composite NumberStyles 


When you call Parse without specifying any flags, the defaults illustrated in 
Figure 6-2 are applied. 
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Integral types Integer 














Figure 6-2. Default parsing flags for numeric types 


If you don't want the defaults shown in Figure 6-2, you must explicitly specify 
NumberStyles: 


int thousand = int.Parse ("3E8", NumberStyles.HexNumber ) ; 

int minusTwo = int.Parse ("(2)", NumberStyles.Integer | 
NumberStyles.ALlowParentheses) ; 

double aMillion = double.Parse ("1,000,000", NumberStyles.Any); 

decimal threeMillion = decimal.Parse ("3e6", NumberStyles.Any); 

decimal fivePointTwo = decimal.Parse ("$5.20", NumberStyles.Currency); 


Because we didn’t specify a format provider, this example works with your local cur- 
rency symbol, group separator, decimal point, and so on. The next example is hard- 
coded to work with the euro sign and a blank group separator for currencies: 


NumberFormatInfo ni = new NumberFormatInfo(); 
ni.CurrencySymbol = "€"; 

ni.CurrencyGroupSeparator = 
double million = double.Parse ("€1 000 000", NumberStyles.Currency, ni); 
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Date/Time Format Strings 


Format strings for DateTime/DateTimeOffset can be divided into two groups based 
on whether they honor culture and format provider settings. Table 6-4 lists those 
that do; Table 6-5 lists those that don’t. The sample output comes from formatting 
the following DateTime (with invariant culture, in the case of Table 6-4): 


new DateTime (2000, 1, 2, 17, 18, 19); 


Table 6-4. Culture-sensitive date/time format strings 


Format string Meaning Sample output 


d Short date 01/02/2000 
D Long date Sunday, 02 January 2000 
t Short time 17:18 
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T Long time 17:18:19 

f Long date + short time Sunday, ©2 January 2000 17:18 

F Long date+ longtime Sunday, ©2 January 2000 17:18:19 
g Short date + short time 01/02/2000 17:18 

G (default) Short date+ longtime 01/02/2000 17:18:19 

m, M Month and day 02 January 

y, Y Year and month January 2000 





Table 6-5. Culture-insensitive date/time format strings 


Format Meaning Sample output Ns 
string 
) Round- 2000-01-02T17:18:19.0000000 Will append time zone 
trippable information unless 
DateTimeKind is 
Unspecified 
r,R RFC 1123 Sun, 02 Jan 2000 17:18:19 GMT You must explicitly convert to 
standard UTC with DateTime. To 
UniversalTime 
s Sortable; ISO 2000-01-02T17:18:19 Compatible with text-based 
8601 sorting 
u “Universal” 2000-01-02 17:18:19Z Similar to above; must explicitly 
sortable convert to UTC 
U UTC Sunday, 02 January 2000 Long date + short time, 
17:18:19 converted to UTC 





The format strings "r", "R", and "u" emit a suffix that implies UTC; yet they don’t 
automatically convert a local to a UTC DateTime (so you must do the conversion 
yourself). Ironically, "U" automatically converts to UTC, but doesn’t write a time 
zone suffix! In fact, "o" is the only format specifier in the group that can write an 
unambiguous DateTime without intervention. 


DateTimeFormatInfo also supports custom format strings: these are analogous to 
numeric custom format strings. The list is extensive and is available online in 
Microsoft’s documentation. Here’s an example of a custom format string: 


yyyy-MM-dd HH:mm:ss 


Parsing and misparsing DateTimes 


Strings that put the month or day first are ambiguous and can easily be misparsed— 
particularly if you have global customers. This is not a problem in user interface 
controls, because the same settings are in force when parsing as when formatting. 
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But when writing to a file, for instance, day/month misparsing can be a real prob- 
lem. There are two solutions: 


¢ Always state the same explicit culture when formatting and parsing (e.g., invar- 
iant culture). 


¢ Format DateTime and DateTimeOffsets in a manner independent of culture. 


The second approach is more robust—particularly if you choose a format that puts 
the four-digit year first: such strings are much more difficult to misparse by another 
party. Further, strings formatted with a standards-compliant year-first format (such 
as "o") can parse correctly alongside locally formatted strings—rather like a “ 


versal donor.’ (Dates formatted with "s" or "u" have the further benefit of being 
sortable.) 


uni- 


To illustrate, suppose that we generate a culture-insensitive DateTime string s as 
follows: 


string s = DateTime.Now.ToString ("o"); 


The "o" format string includes milliseconds in the output. 
The following custom format string gives the same result as 
"o", but without milliseconds: 


yyyy-MM-ddTHH:mm:ss K 


We can reparse this in two ways. ParseExact demands strict compliance with the 
specified format string: 


DateTime dt1 = DateTime.ParseExact (s, "o", null); 


(You can achieve a similar result with XmLConvert’s ToString and ToDateTime 
methods.) 


Parse, however, implicitly accepts both the "o" format and the CurrentCulture 
format: 
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DateTime dt2 = DateTime.Parse (s); 


This works with both DateTime and DateTimeOffset. 


ParseExact is usually preferable if you know the format of the 
string that youre parsing. It means that if the string is incor- 
rectly formatted, an exception will be thrown—which is usu- 
ally better than risking a misparsed date. 


DateTimeStyles 


DateTimeStyles is a flags enum that provides additional instructions when calling 
Parse on a DateTime(Offset). Here are its members: 


None, 
AllowLeadingWhite, AllowTrailingWhite, AllowInnerWhite, 
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AssumeLocal, AssumeUniversal, AdjustToUniversal, 
NoCurrentDateDefault, RoundTripKind 


There is also a composite member, ALlowWhiteSpaces: 
AllowWhiteSpaces = AllowLeadingWhite | AllowTrailingWhite | AllowInnerWhite 


The default is None. This means that extra whitespace is normally prohibited (white- 
space that’s part of a standard DateTime pattern is exempt). 


AssumeLocal and AssumeUniversal apply if the string doesn’t have a time zone suf- 
fix (such as Z or +9:00). AdjustToUniversal still honors time zone suffixes, but 
then converts to UTC using the current regional settings. 


If you parse a string comprising a time but no date, today’s date is applied by 
default. If you apply the NoCurrentDateDefault flag, however, it instead uses 1st 
January 0001. 

Enum Format Strings 


In “Enums” on page 131 in Chapter 3, we described formatting and parsing enum 
values. Table 6-6 lists each format string and the result of applying it to the follow- 
ing expression: 


Console.WriteLine (System.ConsoleColor.Red.ToString (formatString)); 


Table 6-6. Enum format strings 


Format string Meaning Sample output Notes 


Gorg “General” Red Default 

F or F Treat as though Flags Red Works on combined members even if 
attribute were present enum has no Flags attribute 

Dord Decimal value 12 Retrieves underlying integral value 

X orx Hexadecimal value 0000000C Retrieves underlying integral value 





Other Conversion Mechanisms 


In the previous two sections, we covered format providers—.NET’s primary mecha- 
nism for formatting and parsing. Other important conversion mechanisms are scat- 
tered through various types and namespaces. Some convert to and from string, 
and some do other kinds of conversions. In this section, we discuss the following 
topics: 


e The Convert class and its functions: 
— Real-to-integral conversions that round rather than truncate 
— Parsing numbers in base 2, 8, and 16 
— Dynamic conversions 


— Base-64 translations 
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¢ XmlConvert and its role in formatting and parsing for XML 


¢ Type converters and their role in formatting and parsing for designers and 
XAML 


¢ BitConverter, for binary conversions 


Convert 
.NET Core calls the following types base types: 


e bool, char, string, System.DateTime, and System.DateTimeOffset 


e All the C# numeric types 


The static Convert class defines methods for converting every base type to every 
other base type. Unfortunately, most of these methods are useless: either they throw 
exceptions or they are redundant alongside implicit casts. Among the clutter, how- 
ever, are some useful methods, listed in the following sections. 


All base types (explicitly) implement IConvertible, which 
defines methods for converting to every other base type. In 
most cases, the implementation of each of these methods sim- 
ply calls a method in Convert. On rare occasions, it can be 
useful to write a method that accepts an argument of type 
IConvertible 


Rounding real-to-integral conversions 


In Chapter 2, we saw how implicit and explicit casts allow you to convert between 
numeric types. In summary: 


¢ Implicit casts work for nonlossy conversions (e.g., int to double). 


¢ Explicit casts are required for lossy conversions (e.g., double to int). 


Casts are optimized for efficiency; hence, they truncate data that won't fit. This can 
be a problem when converting from a real number to an integer, because often you 
want to round rather than truncate. Convert’s numerical conversion methods 
address just this issue—they always round: 


double d = 3.9; 
int i = Convert.ToInt32 (d); pk == 


Convert uses banker’ rounding, which snaps midpoint values to even integers (this 
avoids positive or negative bias). If banker’s rounding is a problem, first call 
Math.Round on the real number: this accepts an additional argument that allows you 
to control midpoint rounding. 
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Parsing numbers in base 2, 8, and 16 


Hidden among the To(integral-type) methods are overloads that parse numbers 
in another base: 


int thirty = Convert.ToInt32 ("1E", 16); // Parse in hexadecimal 
uint five = Convert.ToUInt32 ("101", 2); // Parse in binary 


The second argument specifies the base. It can be any base you like—as long as it’s 2, 
8, 10, or 16! 


Dynamic conversions 


Occasionally, you need to convert from one type to another, but you don't know 
what the types are until runtime. For this, the Convert class provides a ChangeType 
method: 


public static object ChangeType (object value, Type conversionType); 


The source and target types must be one of the base types. ChangeType also accepts 
an optional IFormatProvider argument. Here's an example: 


Type targetType = typeof (int); 
object source = "42"; 


object result = Convert.ChangeType (source, targetType); 


Console.WriteLine (result); // 42 

Console.WriteLine (result.GetType());  // System. Int32 
An example of when this might be useful is in writing a deserializer that can work 
with multiple types. It can also convert any enum to its integral type (see “Enums” 
on page 131 in Chapter 3). 


A limitation of ChangeType is that you cannot specify a format string or parsing flag. 


Base-64 conversions 


Sometimes, you need to include binary data such as a bitmap within a text docu- 
ment such as an XML file or email message. Base 64 is a ubiquitous means of 
encoding binary data as readable characters, using 64 characters from the ASCII set. 


Convert’s ToBase64String method converts from a byte array to base 64; 
FromBase64String does the reverse. 


XmlConvert 


If you're dealing with data that’s originated from or destined for an XML file, 
XmlConvert (in the System. Xml namespace) provides the most suitable methods for 
formatting and parsing. The methods in XmlConvert handle the nuances of XML 
formatting without needing special format strings. For instance, true in XML is 
true and not True. The .NET Framework internally uses XmLConvert extensively. 
XmLConvert is also good for general-purpose culture-independent serialization. 
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The formatting methods in XmlConvert are all provided as overloaded ToString 
methods; the parsing methods are called ToBoolean, ToDateTime, and so on: 


string s = XmlConvert.ToString (true); // s = "true" 
bool isTrue = XmlLConvert.ToBoolean (s); 


The methods that convert to and from DateTime accept an XmlDateTimeSerializa 
tionMode argument. This is an enum with the following values: 


Unspecified, Local, Utc, RoundtripKind 


Local and Utc cause a conversion to take place when formatting (if the DateTime is 
not already in that time zone). The time zone is then appended to the string: 


2010-02 -22T14:08:30.9375 // Unspecified 
2010-02 -22T14:07:30.9375+09:00 // Local 
2010-02 -22T05:08:30.9375Z // Ute 


Unspecified strips away any time zone information embedded in the DateTime 
(i.e, DateTimeKind) before formatting. RoundtripKind honors the DateTime’s 
DateTimeKind—so when it’s reparsed, the resultant DateTime struct will be exactly 
as it was originally. 


Type Converters 


Type converters are designed to format and parse in design-time environments. 
They also parse values in Extensible Application Markup Language (XAML) docu- 
ments—as used in WPF. 


In .NET Core, there are more than 100 type converters—covering such things as 
colors, images, and URIs. In contrast, format providers are implemented for only a 
handful of simple value types. 


Type converters typically parse strings in a variety of ways—without needing hints. 
For instance, in a WPF application in Visual Studio, if you assign a control a back- 
ground color by typing "Beige" into the appropriate property window, Color’s type 
converter figures out that you're referring to a color name and not an RGB string or 
system color. This flexibility can sometimes make type converters useful in contexts 
outside of designers and XAML documents. 


All type converters subclass TypeConverter in System.ComponentModel. To obtain a 
TypeConverter, call TypeDescriptor.GetConverter. The following obtains a 
TypeConverter for the Color type (in the System. Drawing namespace): 


TypeConverter cc = TypeDescriptor.GetConverter (typeof (Color)); 


Among many other methods, TypeConverter defines methods to ConvertToString 
and ConvertFromString. We can call these as follows: 


Color beige = (Color) cc.ConvertFromString ("Beige"); 
Color purple = (Color) cc.ConvertFromString ("#800080"); 
Color window = (Color) cc.ConvertFromString ("Window"); 
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By convention, type converters have names ending in Converter and are usually in 
the same namespace as the type they’re converting. A type links to its converter via a 
TypeConverterAttribute, allowing designers to pick up converters automatically. 


Type converters can also provide design-time services such as generating standard 
value lists for populating a drop-down list in a designer or assisting with code 
serialization. 


BitConverter 


Most base types can be converted to a byte array, by calling BitConverter.GetBytes: 


foreach (byte b in BitConverter.GetBytes (3.5)) 
Console.Write (b+ " "); //9090000 0 12 64 


BitConverter also provides methods, such as ToDouble, for converting in the other 
direction. 


The decimal and DateTime(Offset) types are not supported by BitConverter. You 
can, however, convert a decimal to an int array by calling decimal.GetBits. To go 
the other way around, decimal provides a constructor that accepts an int array. 


In the case of DateTime, you can call ToBinary on an instance—this returns a Long 
(upon which you can then use BitConverter). The static DateTime. FromBinary 
method does the reverse. 


Globalization 


There are two aspects to internationalizing an application: globalization and 
localization. 


Globalization is concerned with three tasks (in decreasing order of importance): 


1. Making sure that your program doesn’t break when run in another culture 


2. Respecting a local culture’s formatting rules; for instance, when displaying 
dates 


3. Designing your program so that it picks up culture-specific data and strings 
from satellite assemblies that you can later write and deploy 


Localization means concluding that last task by writing satellite assemblies for spe- 
cific cultures. You can do this after writing your program (we cover the details in 
“Resources and Satellite Assemblies” on page 768 in Chapter 18). 


-NET Core helps you with the second task by applying culture-specific rules by 
default. We've already seen how calling ToString on a DateTime or number respects 
local formatting rules. Unfortunately, this makes it easy to fail the first task and have 
your program break because youre expecting dates or numbers to be formatted 
according to an assumed culture. The solution, as we've seen, is either to specify a 
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culture (such as the invariant culture) when formatting and parsing, or to use 
culture-independent methods such as those in XmlConvert. 


Globalization Checklist 


We've already covered the important points in this chapter. Here’s a summary of the 
essential work required: 


Understand Unicode and text encodings (see “Text Encodings and Unicode” on 
page 253). 


¢ Be mindful that methods such as ToUpper and ToLower on char and string are 
culture sensitive: use ToUpperInvariant/ToLowerInvariant unless you want 
culture sensitivity. 


¢ Favor culture-independent formatting and parsing mechanisms for DateTime 
and DateTimeOffsets such as ToString("o") and XmlConvert. 


¢ Otherwise, specify a culture when formatting/parsing numbers or date/times 
(unless you want local-culture behavior). 


Testing 


You can test against different cultures by reassigning Thread’s CurrentCulture 
property (in System.Threading). The following changes the current culture to 
Turkey: 


Thread.CurrentThread.CurrentCulture = CultureInfo.GetCultureInfo ("tr-TR"); 


Turkey is a particularly good test case because: 


e "i". ToUpper() != "I" and "I".ToLower() != "i". 
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¢ Dates are formatted as day.month.year (note the period separator). 
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e The decimal point indicator is a comma instead of a period. 


You can also experiment by changing the number and date formatting settings in 
the Windows Control Panel: these are reflected in the default culture (Culture 
Info.CurrentCulture). 


CultureInfo.GetCultures() returns an array of all available cultures. 


Thread and CultureInfo also support a CurrentUICulture 
property. This is concerned more with localization, which we 
cover in Chapter 18. 
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Working with Numbers 


Conversions 


We covered numeric conversions in previous chapters and sections; Table 6-7 sum- 
marizes all of the options. 


Table 6-7. Summary of numeric conversions 


Task Functions Examples 
Parsing base-10 numbers Parse double d = double.Parse ("3.5"); 
TryParse int. 13 


bool ok = int.TryParse ("3", out i); 


Parsing from base 2, 8,or16 Convert.To int i = Convert.ToInt32 ("1E", 16); 
Integral 


Formatting to hexadecimal ToString ("X") string hex = 45.ToString ("X"); 





Lossless numeric conversion —_ Implicit cast Lt = 233 
double d = i; 
Truncating numeric Explicit cast double d = 23.5; 
conversion int i = (int) d; 
Rounding numeric Convert.To double d = 23.5; 
conversion (real to integral) Integral int i = Convert.ToInt32 (d); 
Math 


Table 6-8 lists the key members of the static Math class. The trigonometric functions 
accept arguments of type double; other methods such as Max are overloaded to 
operate on all numeric types. The Math class also defines the mathematical constants 
E (e) and PI. 


Table 6-8. Methods in the static Math class 


Category Methods 


Rounding Round, Truncate, Floor, Ceiling 
Maximum/minimum Max, Min 


Absolute value and sign Abs, Sign 


Square root Sqrt 

Raising to a power Pow, Exp 
Logarithm Log, Log10 
Trigonometric Sin, Cos, Tan, 


Sinh, Cosh, Tanh, 
Asin, Acos, Atan 
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The Round method lets you specify the number of decimal places with which to 
round as well as how to handle midpoints (away from zero, or with banker’s round- 
ing). Floor and Ceiling round to the nearest integer: Floor always rounds down 
and Ceiling always rounds up—even with negative numbers. 


Max and Min accept only two arguments. If you have an array or sequence of num- 
bers, use the Max and Min extension methods in System.Ling.Enumerable. 


BigInteger 


The BigInteger struct is a specialized numeric type. It resides in the 
System.Numerics namespace and allows you to represent an arbitrarily large integer 
without any loss of precision. 


C# doesn’t provide native support for BigInteger, so there's no way to represent 
BigInteger literals. You can, however, implicitly convert from any other integral 
type to a BigInteger: 


BigInteger twentyFive = 25; // implicit conversion from integer 


To represent a bigger number, such as one googol (101), you can use one of 
BigInteger’s static methods, such as Pow (raise to the power): 


BigInteger googol = BigInteger.Pow (10, 100); 
Alternatively, you can Parse a string: 

BigInteger googol = BigInteger.Parse ("1".PadRight (101, '0')); 
Calling ToString() on this prints every digit: 


Console.WriteLine (googol.ToString()); // 10000000000000000000000000000 
00000000000000000000000000000000000000000000000000000000000000000000000 


You can perform potentially lossy conversions between BigInteger and the stan- 
dard numeric types by using the explicit cast operator: 
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double g2 = (double) googol; // Explicit cast 
BigInteger g3 = (BigInteger) g2; // Explicit cast 
Console.WriteLine (g3); 


The output from this demonstrates the loss of precision: 
9999999999999999673361688041166912... 


BigInteger overloads all the arithmetic operators including remainder (%) as well 
as the comparison and equality operators. 


You can also construct a BigInteger from a byte array. The following code gener- 
ates a 32-byte random number suitable for cryptography and then assigns it to a 
BigInteger: 

// This uses the System.Security.Cryptography namespace: 


RandomNumberGenerator rand = RandomNumberGenerator.Create(); 
byte[] bytes = new byte [32]; 
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rand.GetBytes (bytes); 
var bigRandomNumber = new BigInteger (bytes); // Convert to BigInteger 


The advantage of storing such a number in a BigInteger over a byte array is that 
you get value-type semantics. Calling ToByteArray converts a BigInteger back to a 
byte array. 


Complex 


The Complex struct is another specialized numeric type that represents complex 
numbers with real and imaginary components of type double. Complex resides in 
the namespace (along with BigInteger). 


To use Complex, instantiate the struct, specifying the real and imaginary values: 


var cl = new Complex (2, 3.5); 
var c2 = new Complex (3, 0); 


There are also implicit conversions from the standard numeric types. 


The Complex struct exposes properties for the real and imaginary values as well as 
the phase and magnitude: 


Console.WriteLine (c1.Real); // 2 
Console.WriteLine (c1.Imaginary); // 3.5 
Console.WriteLine (c1.Phase); // 1.05165021254837 


Console.WriteLine (c1.Magnitude); // 4.03112887414927 

You can also construct a Complex number by specifying magnitude and phase: 
Complex c3 = Complex.FromPolarCoordinates (1.3, 5); 

The standard arithmetic operators are overloaded to work on Complex numbers: 


Console.WriteLine (c1 + c2); J] G35 385) 
Console.WriteLine (c1 * c2); // (6, 10.5) 


The Complex struct exposes static methods for more advanced functions, including 
the following: 

e Trigonometric (Sin, Asin, Sinh, Tan, etc.) 

¢ Logarithms and exponentiations 


e Conjugate 


Random 


The Random class generates a pseudorandom sequence of bytes, integers, or 
doubles. 


To use Randon, you first instantiate it, optionally providing a seed to initiate the ran- 
dom number series. Using the same seed guarantees the same series of numbers (if 
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run under the same CLR version), which is sometimes useful when you want 
reproducibility: 

Random r1 = new Random (1); 

Random r2 = new Random (1); 


Console.WriteLine (r1.Next (100) + ", " + r1.Next (100)); // 24, 11 
Console.WriteLine (r2.Next (100) + ", " + r2.Next (100)); // 24, 11 


If you don’t want reproducibility, you can construct Random with no seed; in that 
case, it uses the current system time to make one up. 


Because the system clock has limited granularity, two Random 
instances created close together (typically within 10 ms) will 
yield the same sequence of values. A common trap is to 
instantiate a new Random object every time you need a random 
number rather than reusing the same object. 


A good pattern is to declare a single static Random instance. In 
multithreaded scenarios, however, this can cause trouble 
because Random objects are not thread-safe. We describe a 
workaround in “Thread-Local Storage” on page 914. 


Calling Next(n) generates a random integer between 0 and n - 1. NextDouble gen- 
erates a random double between 0 and 1. NextBytes fills a byte array with random 
values. 


Random is not considered random enough for high-security applications such as 
cryptography. For this, .NET Core provides a cryptographically strong random num- 
ber generator, in the System. Security. Cryptography namespace. Here’s how to use 
it: 

var rand = System.Security.Cryptography.RandomNumberGenerator .Create(); 

byte[] bytes = new byte [32]; 

rand.GetBytes (bytes); // Fill the byte array with random numbers. 
The downside is that it’s less flexible: filling a byte array is the only means of obtain- 
ing random numbers. To obtain an integer, you must use BitConverter: 

byte[] bytes = new byte [4]; 


rand.GetBytes (bytes); 
int i = BitConverter.ToInt32 (bytes, 0); 


Enums 


In Chapter 3, we described C#’s enum type, and showed how to combine members, 
test equality, use logical operators, and perform conversions. .NET extends C#’s 
support for enums through the System. Enum type. This type has two roles: 


e Providing type unification for all enum types 


¢ Defining static utility methods 





Enums | 291 


= 
c 
3 
ro 
7) 
3 
0) 
3 
er 
a 
n 


7 
= 
9 
=} 
oO 
= 
° 
= 
x 





Type unification means that you can implicitly cast any enum member to a 
System. Enum instance: 


enum Nut { Walnut, Hazelnut, Macadamia } 
enum Size { Small, Medium, Large } 


static void Main() 


{ 
Display (Nut.Macadamia) ; // Nut.Macadamia 
Display (Size.Large); // Size.Large 

} 


static void Display (Enum value) 


{ 


Console.WriteLine (value.GetType().Name + + value.ToString()); 


} 


The static utility methods on System. Enum are primarily related to performing con- 
versions and obtaining lists of members. 


Enum Conversions 


There are three ways to represent an enum value: 


e As an enum member 
¢ As its underlying integral value 


e Asa string 


In this section, we describe how to convert between each. 


Enum-to-integral conversions 


Recall that an explicit cast converts between an enum member and its integral value. 
An explicit cast is the correct approach if you know the enum type at compile time: 


[Flags] 

public enum BorderSides { Left=1, Right=2, Top=4, Bottom=8 } 

int i = (int) BorderSides.Top; //i==4 

BorderSides side = (BorderSides) i; // side == BorderSides.Top 


You can cast a System. Enum instance to its integral type in the same way. The trick is 
to first cast to an object and then the integral type: 


static int GetIntegralValue (Enum anyEnum) 


{ 


return (int) (object) anyEnum; 


} 


This relies on you knowing the integral type: the method we just wrote would crash 
if passed an enum whose integral type was Long. To write a method that works with 
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an enum of any integral type, you can take one of three approaches. The first is to call 
Convert. ToDecimal: 


static decimal GetAnyIntegralValue (Enum anyEnum) 
{ 


return Convert.ToDecimal (anyEnum) ; 


} 


This works because every integral type (including ulong) can be converted to deci- 
mal without loss of information. The second approach is to call Enum.Get 
UnderlyingType in order to obtain the enum’s integral type, and then call 
Convert.ChangeType: 


static object GetBoxedIntegralValue (Enum anyEnum) 


{ 
Type integralType = Enum.GetUnderlyingType (anyEnum.GetType()); 


return Convert.ChangeType (anyEnum, integralType); 
} 


This preserves the original integral type, as the following example shows: 


object result = GetBoxedIntegralValue (BorderSides. Top); 
Console.WriteLine (result); /1 4 
Console.WriteLine (result.GetType()); // System. Int32 


Our GetBoxedIntegralType method in fact performs no value 
conversion; rather, it reboxes the same value in another type. It 
translates an integral value in enum-type clothing to an inte- 
gral value in integral-type clothing. We describe this further in 
“How Enums Work” on page 294. 


The third approach is to call Format or ToString specifying the "d" or "D" format 





= 
string. This gives you the enum’s integral value as a string, and it is useful when writ- gi 
ing custom serialization formatters: as 
30 
static string GetIntegralValueAsString (Enum anyEnum) S 53 
{ ox 
return anyEnum.ToString ("D"); // returns something like "4" a 
i 


Integral-to-enum conversions 
Enum. ToObject converts an integral value to an enum instance of the given type: 


object bs = Enum.ToObject (typeof (BorderSides), 3); 
Console.WriteLine (bs); // Left, Right 


This is the dynamic equivalent of the following: 
BorderSides bs = (BorderSides) 3; 


ToObject is overloaded to accept all integral types as well as object. (The latter 
works with any boxed integral type.) 
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String conversions 


To convert an enum to a string, you can either call the static Enum. Format method or 
call ToString on the instance. Each method accepts a format string, which can be 
"G" for default formatting behavior, "D" to emit the underlying integral value as a 
string, "X" for the same in hexadecimal, or "F" to format combined members of an 
enum without the Flags attribute. We listed examples of these in “Standard Format 
Strings and Parsing Flags” on page 275. 


Enum.Parse converts a string to an enum. It accepts the enum type and a string that 
can include multiple members: 


BorderSides leftRight = (BorderSides) Enum.Parse (typeof (BorderSides), 
"Left, Right"); 


An optional third argument lets you perform case-insensitive parsing. An Argument 
Exception is thrown if the member is not found. 


Enumerating Enum Values 
Enum.GetVaLues returns an array comprising all members of a particular enum type: 


foreach (Enum value in Enum.GetValues (typeof (BorderSides))) 
Console.WriteLine (value); 


Composite members such as LeftRight = Left | Right are included, too. 
Enum.GetNames performs the same function, but returns an array of strings. 


Internally, the CLR implements GetValues and GetNames by 
reflecting over the fields in the enum’s type. The results are 
cached for efficiency. 


How Enums Work 


The semantics of enums are enforced largely by the compiler. In the CLR, there's no 
runtime difference between an enum instance (when unboxed) and its underlying 
integral value. Further, an enum definition in the CLR is merely a subtype of 
System.Enum with static integral-type fields for each member. This makes the ordi- 
nary use of an enum highly efficient, with a runtime cost matching that of integral 
constants. 


The downside of this strategy is that enums can provide static but not strong type 
safety. We saw an example of this in Chapter 3: 


[Flags] public enum BorderSides { Left=1, Right=2, Top=4, Bottom=8 } 


BorderSides b = BorderSides.Left; 
b += 1234; // No error! 


When the compiler is unable to perform validation (as in this example), there’s no 
backup from the runtime to throw an exception. 
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What we said about there being no runtime difference between an enum instance 
and its integral value might seem at odds with the following: 


[Flags] public enum BorderSides { Left=1, Right=2, Top=4, Bottom=8 } 


Console.WriteLine (BorderSides.Right.ToString()); // Right 
Console.WriteLine (BorderSides.Right.GetType().Name) ; // BorderSides 
Given the nature of an enum instance at runtime, youd expect this to print 2 and 
Int32! The reason for its behavior comes down to some more compile-time trick- 
ery. C# explicitly boxes an enum instance before calling its virtual methods—such as 
ToString or GetType. And when an enum instance is boxed, it gains a runtime wrap- 

ping that references its enum type. 


The Guid Struct 


The Guid struct represents a globally unique identifier: a 16-byte value that, when 
generated, is almost certainly unique in the world. Guids are often used for keys of 
various sorts, in applications and databases. There are 2!78 or 3.4 x 10°8 unique 
Guids. 


The static Guid. NewGuid method generates a unique Guid: 


Guid g = Guid.NewGuid (); 

Console.WriteLine (g.ToString()); // @d57629c-7d6e- 4847 -97cb-9e2fc25083fe 
To instantiate an existing value, you use one of the constructors. The two most use- 
ful constructors are: 

public Guid (byte[] b); // Accepts a 16-byte array 

public Guid (string g); // Accepts a formatted string 
When represented as a string, a Guid is formatted as a 32-digit hexadecimal number, 


with optional hyphens after the 8th, 12th, 16th, and 20th digits. The whole string 
can also be optionally wrapped in brackets or braces: 


Guid g1 = new Guid ("{0d57629c-7d6e-4847-97cb-9e2fc25083fe}"); 
Guid g2 = new Guid ("0d57629c7d6e484797cb9e2Fc25083fe" ) ; 
Console.WriteLine (g1 == g2); // True 


Being a struct, a Guid honors value-type semantics; hence, the equality operator 
works in the preceding example. 


The ToByteArray method converts a Guid to a byte array. 


The static Guid. Empty property returns an empty Guid (all zeros). This is often used 
in place of null. 
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Equality Comparison 


Until now, we've assumed that the == and != operators are all there is to equality 
comparison. The issue of equality, however, is more complex and subtler, sometimes 
requiring the use of additional methods and interfaces. This section explores the 
standard C# and .NET protocols for equality, focusing particularly on two 
questions: 


e When are == and != adequate—and inadequate—for equality comparison, and 
what are the alternatives? 


¢ How and when should you customize a type’s equality logic? 


But before exploring the details of equality protocols and how to customize them, 
we first must look at the preliminary concept of value versus referential equality. 


Value Versus Referential Equality 
There are two kinds of equality: 


Value equality 
Two values are equivalent in some sense. 


Referential equality 
Two references refer to exactly the same object. 


Unless overridden: 


e Value types use value equality. 


¢ Reference types use referential equality. 


Value types, in fact, can use only value equality (unless boxed). A simple demonstra- 
tion of value equality is to compare two numbers: 


int x= 5, y = 5% 
Console.WriteLine (x == y); // True (by virtue of value equality) 


A more elaborate demonstration is to compare two DateTimeOffset structs. The 
following prints True because the two DateTimeOffsets refer to the same point in 
time and so are considered equivalent: 


var dt1 = new DateTimeOffset (2010, 1, 1, 1, 1, 1, TimeSpan.FromHours(8)); 
var dt2 = new DateTimeOffset (2010, 1, 1, 2, 1, 1, TimeSpan.FromHours(9)); 
Console.WriteLine (dt1 == dt2); // True 


DateTimeOffset is a struct whose equality semantics have 
been tweaked. By default, structs exhibit a special kind of 
value equality called structural equality in which two values 
are considered equal if all of their members are equal. (You 
can see this by creating a struct and calling its Equals method; 
more on this later.) 
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Reference types exhibit referential equality by default. In the following example, f1 
and f2 are not equal, despite their objects having identical content: 


class Foo { public int X; } 


Foo f1 = new Foo { X = 5 }; 
Foo f2 = new Foo { X = 5 }; 
Console.WriteLine (f1 == f2); // False 


In contrast, f3 and f1 are equal because they reference the same object: 


Foo f3 = f1; 

Console.WriteLine (f1 == f3); // True 
Later in this section, we explain how you can customize reference types to exhibit 
value equality. An example of this is the Uri class in the System namespace: 


Uri urii = new Uri ("http://www.lingpad.net"); 
Uri uri2 = new Uri ( "http://www. lingpad.net"); 
Console.WriteLine (uril == uri2); // True 


The string class exhibits similar behavior: 


var si = "http://www. lingpad.net"; 
var s2 = "http://" + "www. lingpad.net"; 
Console.WriteLine (s1 == s2); // True 


Standard Equality Protocols 


There are three standard protocols that types can implement for equality 
comparison: 


e The == and != operators 
e The virtual Equals method in object 


e The IEquatable<T> interface 


In addition, there are the pluggable protocols and the IStructuralEquatable inter- 
face, which we describe in Chapter 7. 


== and!= 


We've already seen in many examples how the standard == and != operators per- 
form equality/inequality comparisons. The subtleties with == and != arise because 
they are operators; thus, they are statically resolved (in fact, they are implemented as 
static functions). So, when you use == or !=, C# makes a compile-time decision as 
to which type will perform the comparison, and no virtual behavior comes into 
play. This is normally desirable. In the following example, the compiler hardwires 
== to the int type because x and y are both int: 


int x = 5; 
int y = 5; 
Console.WriteLine (x == y); // True 
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But in the next example, the compiler wires the == operator to the object type: 


object x = 5; 

object y = 5; 

Console.WriteLine (x == y); // False 
Because object is a class (and so a reference type), object’s == operator uses refer- 
ential equality to compare x and y. The result is false because x and y refer to dif- 
ferent boxed objects on the heap. 


The virtual object.Equals method 


To correctly equate x and y in the preceding example, we can use the virtual Equals 
method. Equals is defined in System.Object and so is available to all types: 

object x = 5; 

object y = 5; 

Console.WriteLine (x.Equals (y)); // True 
Equals is resolved at runtime—according to the object’s actual type. In this case, it 
calls Int32’s Equals method, which applies value equality to the operands, returning 
true. With reference types, Equals performs referential equality comparison by 
default; with structs, Equals performs structural comparison by calling Equals on 
each of its fields. 





Why the Complexity? 


You might wonder why the designers of C# didn’t avoid the problem by making == 
virtual and thus functionally identical to Equals. There are three reasons for this: 


e If the first operand is null, Equals fails with a NuLLReferenceException; a static 
operator does not. 


« Because the == operator is statically resolved, it executes extremely quickly. 
This means that you can write computationally intensive code without 
penalty—and without needing to learn another language such as C++. 


¢ Sometimes it can be useful to have == and Equals apply different definitions of 


equality. We describe this scenario later in this section. 


Essentially, the complexity of the design reflects the complexity of the situation: the 
concept of equality covers a multitude of scenarios. 











Hence, Equals is suitable for equating two objects in a type-agnostic fashion. The 
following method equates two objects of any type: 


public static bool AreEqual (object obj1, object obj2) 
=> obj1.Equals (obj2); 


There is one case, however, in which this fails. If the first argument is null, you get 
a NullReferenceException. Here’s the fix: 
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public static bool AreEqual (object obj1, object obj2) 


{ 
if (obj1 == null) return obj2 == null; 
return obj1.Equals (obj2); 

} 


Or, more succinctly: 
public static bool AreEqual (object obj1, object obj2) 
=> obj1 == null ? obj2 == null : obj1.Equals (obj2); 
The static object.Equals method 


The object class provides a static helper method that does the work of AreEqual in 
the preceding example. Its name is Equals—just like the virtual method—but there's 
no conflict because it accepts two arguments: 


public static bool Equals (object objA, object objB) 


This provides a null-safe equality comparison algorithm for when the types are 
unknown at compile time: 


object x = 3, y = 3; 
Console.WriteLine (object.Equals (x, y)); // True 


x = nulls 
Console.WriteLine (object.Equals (x, y)); // False 
y = null; 


Console.WriteLine (object.Equals (x, y)); // True 


A useful application is when writing generic types. The following code will not com- 
pile if object . Equals is replaced with the == or != operator: 


class Test <T> 
{ 
T _value; 
public void SetValue (T newValue) 
{ 
if (!object.Equals (newValue, _value)) 
{ 
_value = newValue; 
OnValueChanged(); 
} 
} 
protected virtual void OnValueChanged() { ... } 


} 


Operators are prohibited here because the compiler cannot bind to the static 
method of an unknown type. 


py 
5 
au 
ae} 
30 
HE: 
3 
7) 





A more elaborate way to implement this comparison is with 
the EqualityComparer<T> class. This has the advantage of 
avoiding boxing: 

if (!EqualityComparer<T>.Default.Equals (newValue, _value)) 
We discuss EqualityComparer<T> in more detail in Chapter 7 
(see “Plugging in Equality and Order” on page 360). 
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The static object.ReferenceEquals method 


Occasionally, you need to force referential equality comparison. The static 
object .ReferenceEquals method does just that: 


class Widget { ... } 


class Test 


t 


static void Main() 
{ 
Widget w1 = new Widget(); 
Widget w2 = new Widget(); 
Console.WriteLine (object.ReferenceEquals (wi, w2)); // False 
} 
} 


You might want to do this because it’s possible for Widget to override the virtual 
Equals method such that w1.Equals(w2) would return true. Further, it’s possible 
for Widget to overload the == operator so that w1 == w2 would also return true. In 


such cases, calling object .ReferenceEquals guarantees normal referential equality 
semantics. 


Another way to force referential equality comparison is to cast 
the values to object and then apply the == operator. 


The lEquatable<T> interface 


A consequence of calling object. Equals is that it forces boxing on value types. This 
is undesirable in highly performance-sensitive scenarios because boxing is relatively 
expensive compared to the actual comparison. A solution was introduced in C# 2.0, 
with the IEquatable<T> interface: 


public interface IEquatable<T> 


{ 
bool Equals (T other); 


} 


The idea is that IEquatable<T>, when implemented, gives the same result as calling 
object’s virtual Equals method—but more quickly. Most basic .NET types imple- 
ment IEquatable<T>. You can use [Equatable<T> as a constraint in a generic type: 


class Test<T> where T : IEquatable<T> 


{ 
public bool IsEqual (T a, T b) 
{ 
return a.Equals (b); // No boxing with generic T 
} 
} 
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If we remove the generic constraint, the class would still compile, but a. Equals(b) 
would instead bind to the slower object.Equals (slower assuming T was a value 


type). 


When Equals and == are not equal 


We said earlier that it’s sometimes useful for == and Equals to apply different defini- 
tions of equality. For example: 


double x = double.NaN; 
Console.WriteLine (x == x); // False 
Console.WriteLine (x.Equals (x)); // True 


The double type’s == operator enforces that one NaN can never equal anything else— 
even another NaN. This is most natural from a mathematical perspective, and it 
reflects the underlying CPU behavior. The Equals method, however, is obliged to 
apply reflexive equality; in other words: 


e x.Equals (x) must always return true. 


Collections and dictionaries rely on Equals behaving this way; otherwise, they 
could not find an item they previously stored. 


Having Equals and == apply different definitions of equality is actually quite rare 
with value types. A more common scenario is with reference types; this happens 
when the author customizes Equals so that it performs value equality while leaving 
== to perform (default) referential equality. The StringBuilder class does exactly 
that: 


var sb1 = new StringBuilder ("foo"); 

var sb2 = new StringBuilder ("foo"); 

Console.WriteLine (sb1 == sb2); // False (referential equality) 
Console.WriteLine (sb1.Equals (sb2)); // True (value equality) 


Let’s now look at how to customize equality. 


Equality and Custom Types 


Recall default equality comparison behavior: 


¢ Value types use value equality. 


¢ Reference types use referential equality. 
Further: 


e A struct’s Equals method applies structural value equality by default (ie., it 
compares each field in the struct). 


Sometimes, it makes sense to override this behavior when writing a type. There are 
two cases for doing so: 
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¢ To change the meaning of equality 


¢ To speed up equality comparisons for structs 


Changing the meaning of equality 


Changing the meaning of equality makes sense when the default behavior of == and 
Equals is unnatural for your type and is not what a consumer would expect. An 
example is DateTimeOffset, a struct with two private fields: a UTC DateTime and a 
numeric integer offset. If you were writing this type, youd probably want to ensure 
that equality comparisons considered only the UTC DateTime field and not the off- 
set field. Another example is numeric types that support NaN values such as float 
and double. If you were implementing such types yourself, youd want to ensure 
that NaN-comparison logic was supported in equality comparisons. 


With classes, it’s sometimes more natural to offer value equality as the default 
instead of referential equality. This is often the case with small classes that hold a 
simple piece of data, such as System.Uri (or System.String). 


Speeding up equality comparisons with structs 


The default structural equality comparison algorithm for structs is relatively slow. 
Taking over this process by overriding Equals can improve performance by a factor 
of five. Overloading the == operator and implementing IEquatable<T> allows 
unboxed equality comparisons, and this can speed things up by a factor of five 
again. 


Overriding equality semantics for reference types doesn't ben- 
efit performance. The default algorithm for referential equal- 
ity comparison is already very fast because it simply compares 
two 32- or 64-bit references. 


There’s another, rather peculiar case for customizing equality, and that’s to improve 
a struct’s hashing algorithm for better performance in a hashtable. This comes as a 
result of the fact that equality comparison and hashing are joined at the hip. We 
examine hashing in a moment. 


How to override equality semantics 


Here is a summary of the steps: 


1. Override GetHashCode() and Equals(). 
2. (Optionally) overload != and ==. 


3. (Optionally) implement IEquatable<T>. 
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Overriding GetHashCode 


It might seem odd that System.Object—with its small footprint of members— 
defines a method with a specialized and narrow purpose. GetHashCode is a virtual 
method in object that fits this description; it exists primarily for the benefit of just 
the following two types: 


System.Collections.Hashtable 
System.Collections.Generic.Dictionary<TKey, TValue> 


These are hashtables—collections for which each element has a key used for storage 
and retrieval. A hashtable applies a very specific strategy for efficiently allocating 
elements based on their key. This requires that each key have an Int32 number, or 
hash code. The hash code need not be unique for each key, but should be as varied as 
possible for good hashtable performance. Hashtables are considered important 
enough that GetHashCode is defined in System.Object—so that every type can emit 
a hash code. 


We describe hashtables in detail in “Dictionaries” on page 344 
in Chapter 7. 


Both reference and value types have default implementations of GetHashCode, 
meaning that you don’t need to override this method—unless you override Equals. 
(And if you override GetHashCode, you will almost certainly want to also override 
Equals.) 


Here are the other rules for overriding object .GetHashCode: 


¢ It must return the same value on two objects for which Equals returns true 
(hence, GetHashCode and Equals are overridden together). 


e It must not throw exceptions. 


¢ It must return the same value if called repeatedly on the same object (unless the 
object has changed). 


For maximum performance in hashtables, you should write GetHashCode so as to 
minimize the likelihood of two different values returning the same hashcode. This 
gives rise to the third reason for overriding Equals and GetHashCode on structs, 
which is to provide a more efficient hashing algorithm than the default. The default 
implementation for structs is at the discretion of the runtime and can be based on 
every field in the struct. 


In contrast, the default GetHashCode implementation for classes is based on an inter- 
nal object token, which is unique for each instance in the CLR’s current 
implementation. 
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If an object’s hashcode changes after it’s been added as a key to 
a dictionary, the object will no longer be accessible in the dic- 
tionary. You can preempt this by basing hashcode calculations 
on immutable fields. 


We provide a complete example illustrating how to override GetHashCode shortly. 


Overriding Equals 


The axioms for object. Equals are as follows: 


e An object cannot equal null (unless it’s a nullable type). 

e Equality is reflexive (an object equals itself). 

e Equality is commutative (if a.Equals(b), then b.Equals(a)). 

e Equality is transitive (if a. Equals(b) and b.Equals(c), then a.Equals(c)). 


e Equality operations are repeatable and reliable (they don’t throw exceptions). 


Overloading == and != 


In addition to overriding Equals, you can optionally overload the equality and 
inequality operators. This is nearly always done with structs because the conse- 
quence of not doing so is that the == and != operators will simply not work on your 
type. 


With classes, there are two ways to proceed: 


e Leave == and != alone—so that they apply referential equality. 


e Overload == and != in line with Equals. 


The first approach is most common with custom types—especially mutable types. It 
ensures that your type follows the expectation that == and != should exhibit referen- 
tial equality with reference types and this avoids confusing consumers. We saw an 
example earlier: 


var sb1 = new StringBuilder ("foo"); 

var sb2 = new StringBuilder ("foo"); 

Console.WriteLine (sb1 == sb2); // False (referential equality) 
Console.WriteLine (sb1.Equals (sb2)); // True (value equality) 


The second approach makes sense with types for which a consumer would never 


want referential equality. These are typically immutable—such as the string and 
System.Uri classes—and are sometimes good candidates for structs. 


Although it’s possible to overload != such that it means some- 
thing other than ! (==), this is almost never done in practice, 
except in cases such as comparing float .NaN. 
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Implementing lEquatable<T> 


For completeness, it’s also good to implement IEquatable<T> when overriding 
Equals. Its results should always match those of the overridden object’s Equals 
method. Implementing IEquatable<T> comes at no programming cost if you struc- 
ture your Equals method implementation as in the example that follows in a 
moment. 


An example: The Area struct 


Imagine that we need a struct to represent an area whose width and height are inter- 
changeable. In other words, 5 x 10 is equal to 10 x 5. (Such a type would be suitable 
in an algorithm that arranges rectangular shapes.) 


Here's the complete code: 


public struct Area : IEquatable <Area> 


t 


public readonly int Measure1; 
public readonly int Measure2; 


public Area (int m1, int m2) 


{ 
Measurel = Math.Min (m1, m2); 
Measure2 = Math.Max (m1, m2); 
} 
public override bool Equals (object other) 
{ 
if (!(other is Area)) return false; 
return Equals ((Area) other); // Calls method below 
} 
public bool Equals (Area other) // Implements IEquatable<Area> 
=> Measure1 == other.Measure1 && Measure2 == other .Measure2; 


public override int GetHashCode() 
=> HashCode.Combine (Measure1, Measure2); 


public static bool operator == (Area al, Area a2) => a1.Equals (a2); 


public static bool operator != (Area a1, Area a2) => !al1.Equals (a2); 


Here's another way to implement the Equals method, using 
nullable value types: 
Area? otherArea = other as Area?; 


return otherArea.HasValue && Equals (otherArea.Value) ; 


In implementing GetHashCode, we used .NET Core’s HashCode. Combine function to 
produce a composite hashcode. (Before that function existed, a popular approach 
was to multiply each value by some prime number and then add them together.) 
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Here’s a demonstration of the Area struct: 


Area ai = new Area (5, 10); 
Area a2 = new Area (10, 5); 
Console.WriteLine (a1.Equals (a2)); // True 
Console.WriteLine (a1 == a2); // True 


Pluggable equality comparers 


If you want a type to take on different equality semantics just for a specific scenario, 
you can use a pluggable IEqualityComparer. This is particularly useful in conjunc- 
tion with the standard collection classes, and we describe it in the following chapter, 
in “Plugging in Equality and Order” on page 360 in Chapter 7. 


Order Comparison 


As well as defining standard protocols for equality, C# and .NET define two stan- 
dard protocols for determining the order of one object relative to another: 


e The IComparable interfaces (IComparable and IComparable<T>) 


e The >and < operators 


The IComparable interfaces are used by general-purpose sorting algorithms. In the 
following example, the static Array.Sort method works because System.String 
implements the IComparable interfaces: 


string[] colors = { "Green", "Red", "Blue" }; 
Array.Sort (colors); 
foreach (string c in colors) Console.Write (c +" "); // Blue Green Red 


The < and > operators are more specialized, and they are intended mostly for 
numeric types. Because they are statically resolved, they can translate to highly effi- 
cient bytecode, suitable for computationally intensive algorithms. 


.NET Core also provides pluggable ordering protocols, via the [Comparer interfaces. 
We describe these in the final section of Chapter 7. 


IComparable 

The IComparable interfaces are defined as follows: 
public interface IComparable { int CompareTo (object other); } 
public interface IComparable<in T> { int CompareTo (T other); } 


The two interfaces represent the same functionality. With value types, the generic 
type-safe interface is faster than the nongeneric interface. In both cases, the 
CompareTo method works as follows: 

e Ifa comes after b, a.CompareTo(b) returns a positive number. 


e Ifais the same as b, a. CompareTo(b) returns 0. 
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e Ifa comes before b, a. CompareTo(b) returns a negative number. 


For example: 


Console.WriteLine ("Beck".CompareTo ("Anne")); // 1 
Console.WriteLine ("Beck".CompareTo ("Beck")); // 9 
Console.WriteLine ("Beck".CompareTo ("Chris")); // -1 


Most of the base types implement both IComparable interfaces. These interfaces are 
also sometimes implemented when writing custom types. We provide an example 
shortly. 


Comparable versus Equals 


Consider a type that both overrides Equals and implements the ICompar able inter- 
faces. Youd expect that when Equals returns true, CompareTo should return 0. And 
youd be right. But here's the catch: 


e When Equals returns false, CompareTo can return what it likes (as long as it’s 
internally consistent)! 


In other words, equality can be “fussier” than comparison, but not vice versa (vio- 
late this and sorting algorithms will break). So, CompareTo can say, “All objects are 
equal,” whereas Equals says, “But some are more equal than others!” 


A great example of this is System.String. string’s Equals method and == operator 
use ordinal comparison, which compares the Unicode point values of each charac- 
ter. Its CompareTo method, however, uses a less fussy culture-dependent comparison. 


On most computers, for instance, the strings “U” and “U” are different according to 
Equals, but the same according to CompareTo. 


In Chapter 7, we discuss the pluggable ordering protocol, IComparer, which allows 
you to specify an alternative ordering algorithm when sorting or instantiating a sor- 
ted collection. A custom IComparer can further extend the gap between CompareTo 
and Equals—a case-insensitive string comparer, for instance, will return 0 when 
comparing "A" and "a". The reverse rule still applies, however: CompareTo can 
never be fussier than Equals. 


When implementing the IComparable interfaces in a custom 
type, you can avoid running afoul of this rule by writing the 
first line of CompareTo as follows: 


if (Equals (other)) return 0; 


After that, it can return what it likes, as long as it’s consistent! 


<and > 
Some types define < and > operators; for instance: 


bool after2010 = DateTime.Now > new DateTime (2010, 1, 1); 
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You can expect the < and > operators, when implemented, to be functionally consis- 
tent with the IComparable interfaces. This is standard practice across .NET Core. 


It's also standard practice to implement the IComparable interfaces whenever < and 
> are overloaded, although the reverse is not true. In fact, most .NET types that 
implement IComparable do not overload < and >. This differs from the situation 
with equality, for which it’s normal to overload == when overriding Equals. 


Typically, > and < are overloaded only when: 
¢ A type has a strong intrinsic concept of “greater than” and “less than” (versus 
IComparable’s broader concepts of “comes before” and “comes after”). 
¢ There is only one way or context in which to perform the comparison. 


¢ The result is invariant across cultures. 
System.String doesn't satisfy the last point: the results of string comparisons can 
vary according to language. Hence, string doesn’t support the > and < operators: 


bool error = "Beck" > "Anne"; // Compile-time error 


Implementing the Comparable Interfaces 


In the following struct, representing a musical note, we implement the IComparable 
interfaces as well as overloading the < and > operators. For completeness, we also 
override Equals/GetHashCode and overload == and !=. 


public struct Note : IComparable<Note>, IEquatable<Note>, IComparable 
{ 


int _semitonesFromA; 
public int SemitonesFromA { get { return _semitonesFromA; } } 


public Note (int semitonesFromA) 


{ 
_semitonesFromA = semitonesFromA; 
} 
public int CompareTo (Note other) // Generic IComparable<T> 


if (Equals (other)) return 0; // Fail-safe check 
return _semitonesFromA.CompareTo (other. _semitonesFromA) ; 


} 
int IComparable.CompareTo (object other) // Nongeneric IComparable 


if (!(other is Note)) 
throw new InvalidOperationException ("CompareTo: Not a note"); 
return CompareTo ((Note) other); 


} 


public static bool operator < (Note ni, Note n2) 
=> ni.CompareTo (n2) < 0; 
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public static bool operator > (Note ni, Note n2) 
=> ni.CompareTo (n2) > 0; 


public bool Equals (Note other) // for IEquatable<Note> 
=> _semitonesFromA == other._semitonesFromA; 


public override bool Equals (object other) 


is 
if (!(other is Note)) return false; 
return Equals ((Note) other); 


a 
public override int GetHashCode() => _semitonesFromA.GetHashCode(); 
public static bool operator == (Note n1, Note n2) => n1.Equals (n2); 


public static bool operator != (Note n1, Note n2) => !(n1 == n2); 
} 


Utility Classes 


Console 


The static Console class handles standard input/output for console-based applica- 
tions. In a command-line (Console) application, the input comes from the keyboard 
via Read, ReadKey, and ReadLine, and the output goes to the text window via Write 
and WriteLine. You can control the window’s position and dimensions with the 
properties WindowLeft, WindowTop, WindowHeight, and WindowWidth. You can also 
change the BackgroundColor and ForegroundColor properties and manipulate the 
cursor with the CursorLeft, CursorTop, and CursorSize properties: 


Console.WindowWidth = Console.LargestWindowWidth; 
Console.ForegroundColor = ConsoleColor.Green; 
Console.Write ("test... 50%"); 

Console.CursorLeft -= 3; 

Console.Write ("90%"); // test... 90% 


The Write and WriteLine methods are overloaded to accept a composite format 
string (see String.Format in “String and Text Handling” on page 243). However, 
neither method accepts a format provider, so youre stuck with 
CultureInfo.Current Culture. (The workaround, of course, is to explicitly call 
string.Format.) 


The Console. Out property returns a TextWriter. Passing Console. Out to a method 
that expects a TextWriter is a useful way to get that method to write to the Console 
for diagnostic purposes. 


You can also redirect the Console’s input and output streams via the SetIn and 
SetOut methods: 


// First save existing output writer: 
System.10.TextWriter oldOut = Console.Out; 
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// Redirect the console's output to a file: 
using (System.I0.TextWriter w = System.I0.File.CreateText 
("e:\\output.txt")) 


{ 
Console.SetOut (w); 


Console.WriteLine ("Hello world"); 


} 


// Restore standard console output 
Console.SetOut (oldOut); 


In Chapter 15, we describe how streams and text writers work. 


When running WPF or Windows Forms applications under 
Visual Studio, the Console’s output is automatically redirected 
to Visual Studio's output window (in debug mode). This can 
make Console.wWrite useful for diagnostic purposes, although 
in most cases, the Debug and Trace classes in the 
System.Diagnostics namespace are more appropriate (see 
Chapter 13). 


Environment 


The static System. Environment class provides a range of useful properties: 


Files and folders 
CurrentDirectory, SystemDirectory, CommandLine 


Computer and operating system 
MachineName, ProcessorCount, OSVersion, NewLine 


User logon 
UserName, UserInteractive, UserDomainName 


Diagnostics 
TickCount, StackTrace, WorkingSet, Version 


You can obtain additional folders by calling GetFolderPath; we describe this in 
“File and Directory Operations” on page 665 in Chapter 15. 


You can access OS environment variables (what you see when you type “set” at the 
command prompt) with the following three methods: GetEnvironmentVariable, 
GetEnvironmentVariables, and SetEnvironmentVariable. 


The ExitCode property lets you set the return code—for when your program is 
called from a command or batch file—and the FailFast method terminates a pro- 
gram immediately, without performing cleanup. 


The Environment class available to Windows Store apps offers just a limited number 
of members (ProcessorCount, NewLine, and FailFast). 
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Process 


The Process class in System.Diagnostics allows you to launch a new process. (In 
Chapter 13, we describe how you can also use it to interact with other processes 
running on the computer). 


For security reasons, the Process class is not available to Win- 
dows Store apps, and you cannot start arbitrary processes. 
Instead, you must use the Windows.System.Launcher class to 
“launch” a URI or file to which you have access; for example: 


Launcher.LaunchUriAsync (new Uri ("http://albahari.com")); 


var file = await KnownFolders.DocumentsLibrary 
.GetFileAsync ("foo.txt"); 
Launcher.LaunchFileAsync (file); 


This opens the URI or file, using whatever program is associ- 
ated with the URI scheme or file extension. Your program 
must be in the foreground for this to work. 


The static Process.Start method has several overloads; the simplest accepts a sim- 
ple filename with optional arguments: 


Process.Start ("notepad.exe"); 
Process.Start ("notepad.exe", "e:\\file.txt"); 


The most flexible overload accepts a ProcessStartInfo instance. With this, you can 
capture and redirect the launched process's input, output, and error output (if you 
leave UseShellExecute as false). The following captures the output of calling 
ipconfig: 


ProcessStartInfo psi = new ProcessStartInfo 


{ 


FileName = "cmd.exe", 
Arguments = "/c ipconfig /all", 
RedirectStandardOutput = true, 
UseSheLlExecute = false 
33 
Process p = Process.Start (psi); 
string result = p.StandardOutput.ReadToEnd(); 
Console.WriteLine (result); 


If you dont redirect output, Process. Start executes the program in parallel to the 
caller. If you want to wait for the new process to complete, you can call WaitForExit 
on the Process object, with an optional timeout. 


Redirecting output and error streams 


With UseShellExecute false (the default in .NET Core), you can capture the stan- 
dard input, output, and error streams and then write/read these streams via the 
StandardInput, StandardOutput, and StandardError properties. 


A difficulty arises when you need to redirect both the standard output and standard 
error streams, in that you can't usually know in which order to read data from each 
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(because you don't know in advance how the data will be interleaved). The solution 
is to read from both streams at once, which you can accomplish by reading from (at 
least) one of the streams asynchronously. Here’s how to do this: 


e Handle the OutputDataReceived and/or ErrorDataReceived events. These 
events fire when output/error data is received. 
e Call BeginOutputReadLine and/or BeginErrorReadLine. This enables the 


aforementioned events. 


The following method runs an executable while capturing both the output and error 
streams: 


(string output, string errors) Run (string exePath, string args = "") 
{ 

using var p = Process.Start (new ProcessStartInfo (exePath, args) 

{ 


RedirectStandardOutput = true, 
RedirectStandardError = true, 
UseShellExecute = false, 

}) 


var errors = new StringBuilder (); 


// Read from the error stream asynchronously... 
p.ErrorDataReceived += (sender, errorArgs) => 


{ 


if (errorArgs.Data != null) errors.AppendLine (errorArgs.Data); 
33 


p.BeginErrorReadLine (); 


// ...while we read from the output stream synchronously: 
string output = p.StandardOutput.ReadToEnd(); 


p.WaitForExit(); 
return (output, errors.ToString()); 


UseShellExecute 
The UseShellExecute flag changes how the CLR starts the process. With Use 
ShellExecute true, you can do the following: 
¢ Specify a path to a file or document rather than an executable (resulting in the 
operating system opening the file or document with its associated application) 


¢ Specify a URL (resulting in the operating system navigating to that URL in the 
default web browser) 


¢ (Windows only) Specify a Verb (such as runas, to run the process with admin- 
istrative elevation) 
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In .NET Core, the default for UseShellExecute is false, 
whereas in .NET Framework, it was true. Because this is a 
breaking change, it’s worth checking all calls to 
Process.Start when porting code from .NET Framework 
to .NET Core. 


The drawback is that you cannot redirect the input or output streams. Should you 
need to do so—while launching a file or document—a workaround is to set Use 
ShellExecute to false and invoke the command-line process (cmd.exe) with the 
“/c” switch, as we did earlier when calling ipconfig. 


Under Windows, UseShellLExecute instructs the CLR to use the Windows ShellExe- 
cute function instead of the CreateProcess function. Under Linux, UseShellExecute 
instructs the CLR to call xdg-open, gnome-open, or kfmclient. 


AppContext 


The static System. AppContext class exposes two useful properties: 


¢ BaseDirectory returns the folder in which the application started. This folder 
is important for assembly resolution (finding and loading dependencies) and 
locating configuration files (such as appsettings.json). 


e TargetFrameworkName tells you the name and version of the NET Core Frame- 
work that the application targets (as specified in its .runtimeconfig.json file). 
This might be older than the runtime actually in use. 


In addition, the AppContext class manages a global string-keyed dictionary of 
Boolean values, intended to offer library writers a standard mechanism for allowing 
consumers to switch new features on or off. This untyped approach makes sense 
with experimental features that you want to keep undocumented to the majority of 
users. 


The consumer of a library requests that you enable a feature as follows: 
AppContext.SetSwitch ("MyLibrary.SomeBreakingChange", true); 
Code within that library can then check for that switch as follows: 


bool isDefined, switchValue; 
isDefined = AppContext.TryGetSwitch ("MyLibrary.SomeBreakingChange", 
out switchValue); 


TryGetSwitch returns false if the switch is undefined; this lets you distinguish an 
undefined switch from one whose value is set to false, should this be necessary. 


Ironically, the design of TryGetSwitch illustrates how not to 
write APIs. The out parameter is unnecessary, and the method 
should instead return a nullable bool whose value is true, 
false, or null for undefined. This would then enable the fol- 
lowing use: 


bool switchValue = AppContext.GetSwitch ("...") ?? false; 
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Collections 


.NET Core provides a standard set of types for storing and managing collections of 
objects. These include resizable lists, linked lists, sorted and unsorted dictionaries, 
as well as arrays. Of these, only arrays form part of the C# language; the remaining 
collections are just classes you instantiate like any other. 


We can divide the types in the Framework for collections into the following 
categories: 

e Interfaces that define standard collection protocols 

e Ready-to-use collection classes (lists, dictionaries, etc.) 

¢ Base classes for writing application-specific collections 
This chapter covers each of these categories, with an additional section on the types 
used in determining element equality and order. 


The collection namespaces are as follows: 


Namespace Contains 


System.Collections Nongeneric collection classes and interfaces 
System.Collections.Specialized Strongly typed nongeneric collection classes 
System.Collections.Generic Generic collection classes and interfaces 
System.Collections.ObjectModel Proxies and bases for custom collections 


System.Collections.Concurrent — Thread-safe collections (see Chapter 23) 





Enumeration 


In computing, there are many different kinds of collections, ranging from simple 
data structures such as arrays or linked lists, to more complex ones such as red/ 
black trees and hashtables. Although the internal implementation and external 
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characteristics of these data structures vary widely, the ability to traverse the con- 
tents of the collection is an almost universal need. The Framework supports this 
need via a pair of interfaces (IEnumerable, IEnumerator, and their generic counter- 
parts) that allow different data structures to expose a common traversal API. These 
are part of a larger set of collection interfaces illustrated in Figure 7-1. 





[Enumerator A+ lEnumerator<T> 


|Enumerable 





at lEnumerable<T> Enumeration only 


Nongeneric : Generic 


*!Collection<T> has added functionality 











Figure 7-1. Collection interfaces 


lEnumerable and lEnumerator 


The IEnumerator interface defines the basic low-level protocol by which elements 
in a collection are traversed—or enumerated—in a forward-only manner. Its decla- 
ration is as follows: 


public interface IEnumerator 


{ 
bool MoveNext(); 
object Current { get; } 
void Reset(); 


} 


MoveNext advances the current element or “cursor” to the next position, returning 
false if there are no more elements in the collection. Current returns the element 
at the current position (usually cast from object to a more specific type). MoveNext 
must be called before retrieving the first element—this is to allow for an empty col- 
lection. The Reset method, if implemented, moves back to the start, allowing the 
collection to be enumerated again. Reset exists mainly for Component Object 
Model (COM) interoperability; calling it directly is generally avoided because it’s 
not universally supported (and is unnecessary in that it’s usually just as easy to 
instantiate a new enumerator.) 


Collections do not usually implement enumerators; instead, they provide enumera- 
tors, via the interface IEnumerable: 
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public interface IEnumerable 


{ 


IEnumerator GetEnumerator(); 


} 


By defining a single method retuning an enumerator, IEnumerable provides flexibil- 
ity in that the iteration logic can be farmed off to another class. Moreover, it means 
that several consumers can enumerate the collection at once without interfering 
with one another. You can think of IEnumerable as “[EnumeratorProvider,’ and it is 
the most basic interface that collection classes implement. 


The following example illustrates low-level use of IEnumerable and IEnumerator: 
string s = "Hello"; 


// Because string implements IEnumerable, we can call GetEnumerator(): 
IEnumerator rator = s.GetEnumerator(); 


while (rator.MoveNext()) 
{ 


char c = (char) rator.Current; 
Console.Write (c + "."); 


} 


// Output: H.e.l.l.o. 


However, it’s rare to call methods on enumerators directly in this manner because 
C# provides a syntactic shortcut: the foreach statement. Here’s the same example 
rewritten using foreach: 


string s = "Hello"; // The string class implements IEnumerable 


foreach (char c in s) 
Console.Write (c + "."); 


lEnumerable<T> and lEnumerator<T> 


IEnumerator and IEnumerable are nearly always implemented in conjunction with 
their extended generic versions: 


public interface IEnumerator<T> : IEnumerator, IDisposable 


{ 
T Current { get; } 


} 


public interface IEnumerable<T> : IEnumerable 


{ 


IEnumerator<T> GetEnumerator(); 


} 


By defining a typed version of Current and GetEnumerator, these interfaces 
strengthen static type safety, avoid the overhead of boxing with value-type elements, 
and are more convenient to the consumer. Arrays automatically implement 
IEnumerable<T> (where T is the member type of the array). 
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Thanks to the improved static type safety, calling the following method with an 
array of characters will generate a compile-time error: 


void Test (IEnumerable<int> numbers) { ... } 


It's a standard practice for collection classes to publicly expose IEnumerable<T> 
while “hiding” the nongeneric IEnumerable through explicit interface implementa- 
tion. This is so that if you directly call GetEnumerator(), you get back the type-safe 
generic IEnumerator<T>. There are times, though, when this rule is broken for rea- 
sons of backward compatibility (generics did not exist prior to C# 2.0). A good 
example is arrays—these must return the nongeneric (the nice way of putting it is 
classic) IEnumerator to avoid breaking earlier code. To get a_ generic 
IEnumerator<T>, you must cast to expose the explicit interface: 


int[] data = { 1, 2, 3 }; 
var rator = ((IEnumerable <int>)data).GetEnumerator(); 


Fortunately, you rarely need to write this sort of code, thanks to the foreach 
statement. 


lEnumerable<T> and IDisposable 


IEnumerator<T> inherits from IDisposable. This allows enumerators to hold refer- 
ences to resources such as database connections—and ensure that those resources 
are released when enumeration is complete (or abandoned partway through). The 
foreach statement recognizes this detail and translates the following: 


foreach (var element in somethingEnumerable) { ... } 
into the logical equivalent of this: 


using (var rator = somethingEnumerable.GetEnumerator()) 
while (rator.MoveNext()) 
{ 


var element = rator.Current; 


re 


The using block ensures disposal—more on IDisposable in Chapter 12. 


Implementing the Enumeration Interfaces 


You might want to implement IEnumerable or IEnumerable<T> for one or more of 
the following reasons: 

¢ To support the foreach statement 

¢ To interoperate with anything expecting a standard collection 

¢ To meet the requirements of a more sophisticated collection interface 


e To support collection initializers 





318 | Chapter 7: Collections 





When to Use the Nongeneric Interfaces 


Given the extra type safety of the generic collection interfaces such as IEnumerable 
<T>, the question arises: do you ever need to use the nongeneric IEnumerable (or 
ICollection or IList)? 


In the case of IEnumerable, you must implement this interface in conjunction with 
IEnumerable<T>—because the latter derives from the former. However, it’s very rare 
that you actually implement these interfaces from scratch: in nearly all cases, you 
can take the higher-level approach of using iterator methods, Collection<T>, and 
LINQ. 


So, what about as a consumer? In nearly all cases, you can manage entirely with the 
generic interfaces. The nongeneric interfaces are still occasionally useful, though, in 
their ability to provide type unification for collections across all element types. The 
following method, for instance, counts elements in any collection recursively: 


public static int Count (IEnumerable e) 


{ 
int count = 0; 
foreach (object element in e) 


{ 


var subCollection = element as IEnumerable; 
if (subCollection != null) 

count += Count (subCollection); 
else 

count++; 


} 


return count; 


} 
Because C# offers covariance with generic interfaces, it might seem valid to have 
this method instead accept IEnumerable<object>. This, however, would fail with 
value-type elements and with legacy collections that dont implement 
IEnumerable<T>—an example is ControlCollection in Windows Forms. 


(On a slight tangent, you might have noticed a potential bug in our example: cyclic 
references will cause infinite recursion and crash the method. We could fix this most 
easily with the use of a HashSet [see “HashSet<T> and SortedSet<T>” on page 342].) 
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To implement IEnumerable/IEnumerable<T>, you must provide an enumerator. You 
can do this in one of three ways: 





¢ If the class is “wrapping” another collection, by returning the wrapped collec- 
tion’s enumerator 


e Via an iterator using yield return 


¢ By instantiating your own IEnumerator/IEnumerator<T> implementation 
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You can also subclass an existing collection: Collection<T> is 
designed just for this purpose (see “Customizable Collections 
and Proxies” on page 351). Yet another approach is to use the 
LINQ query operators, which we cover in Chapter 8. 


Returning another collection’s enumerator is just a matter of calling GetEnumerator 
on the inner collection. However, this is viable only in the simplest scenarios in 
which the items in the inner collection are exactly what are required. A more flexi- 
ble approach is to write an iterator, using C#’s yield return statement. An iterator 
is a C# language feature that assists in writing collections, in the same way the 
foreach statement assists in consuming collections. An iterator automatically han- 
dles the implementation of IEnumerable and IEnumerator—or their generic ver- 
sions. Here’s a simple example: 


public class MyCollection : IEnumerable 


{ 
int[] data = { 1, 2, 3 }; 


public IEnumerator GetEnumerator() 
{ 
foreach (int i in data) 
yield return i; 
} 
} 


Notice the black magic: GetEnumerator doesn't appear to return an enumerator at 
all! Upon parsing the yield return statement, the compiler writes a hidden nested 
enumerator class behind the scenes, and then refactors GetEnumerator to instanti- 
ate and return that class. Iterators are powerful and simple (and are used extensively 
in the implementation of LINQ-to-Object’s standard query operators). 


Keeping with this approach, we can also implement the generic interface 
TEnumerable<T>: 


public class MyGenCollection : IEnumerable<int> 


{ 
int[] data = { 1, 2, 3 }; 


public IEnumerator<int> GetEnumerator() 


{ 
foreach (int i in data) 
yield return i; 


} 


// Explicit implementation keeps it hidden: 
IEnumerator IEnumerable.GetEnumerator() => GetEnumerator(); 


} 


Because IEnumerable<T> inherits from IEnumerable, we must implement both the 
generic and the nongeneric versions of GetEnumerator. In accordance with 
standard practice, we've implemented the nongeneric version explicitly. It can 
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simply call the generic GetEnumerator because IEnumerator<T> inherits from 
IEnumerator. 


The class we've just written would be suitable as a basis from which to write a more 
sophisticated collection. However, if we need nothing above a_ simple 
IEnumerable<T> implementation, the yield return statement allows for an easier 
variation. Rather than writing a class, you can move the iteration logic into a 
method returning a generic IEnumerable<T> and let the compiler take care of the 
rest. Here’s an example: 


public static IEnumerable <int> GetSomeIntegers() 


{ 


yield return 1; 
yield return 2; 
yield return 3; 


} 


Here’s our method in use: 


foreach (int i in Test.GetSomeIntegers()) 
Console.WriteLine (i); 


The final approach in writing GetEnumerator is to write a class that implements 
IEnumerator directly. This is exactly what the compiler does behind the scenes, in 
resolving iterators. (Fortunately, it’s rare that you'll need to go this far yourself.) The 
following example defines a collection that’s hardcoded to contain the integers 1, 2, 
and 3: 


public class MyIntList : IEnumerable 


{ 
int[] data = { 1, 2, 3 }; 


public IEnumerator GetEnumerator() => new Enumerator (this); 





class Enumerator : IEnumerator // Define an inner class 
{ // for the enumerator. 
MyIntList collection; 
int currentIndex = -1; 
(@) 
2 
public Enumerator (MyIntList items) => this.collection = items; o 
= 
public object Current 2 
{ 7) 
get 
{ 
if (currentIndex == -1) 
throw new InvalidOperationException ("Enumeration not started!"); 
if (currentIndex == collection.data.Length) 
throw new InvalidOperationException ("Past end of list!"); 
return collection.data [currentIndex]; 
} 
} 


public bool MoveNext() 
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} 


} 


{ 


if (currentIndex >= collection.data.Length - 1) return false; 
return ++currentIndex < collection.data.Length; 


} 


public void Reset() => currentIndex = -1; 


Implementing Reset is optional—you can instead throw a 
NotSupportedException. 


Note that the first call to MoveNext should move to the first (and not the second) 
item in the list. 


To get on par with an iterator in functionality, we must also implement 
IEnumerator<T>. Here’s an example with bounds checking omitted for brevity: 


class MyIntList : IEnumerable<int> 


} 


int[] data = { 1, 2, 3 }; 


// The generic enumerator is compatible with both IEnumerable and 
// TEnumerable<T>. We implement the nongeneric GetEnumerator method 
// explicitly to avoid a naming conflict. 


public IEnumerator<int> GetEnumerator() => new Enumerator(this); 
IEnumerator IEnumerable.GetEnumerator() => new Enumerator(this); 


class Enumerator : IEnumerator<int> 


{ 


} 


int currentIndex = -1; 
MyIntList collection; 


public Enumerator (MyIntList items) => this.items = items; 


public int Current => collection.data [currentIndex]; 
object IEnumerator.Current => Current; 


public bool MoveNext() => ++currentIndex < collection.data.Length; 
public void Reset() => currentIndex = -1; 
// Given we don't need a Dispose method, it's good practice to 


// implement it explicitly, so it's hidden from the public interface. 
void IDisposable.Dispose() {} 


The example with generics is faster because IEnumerator<int>.Current doesn't 
require casting from int to object and so avoids the overhead of boxing. 
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The ICollection and IList Interfaces 


Although the enumeration interfaces provide a protocol for forward-only iteration 
over a collection, they don't provide a mechanism to determine the size of the col- 
lection, access a member by index, search, or modify the collection. For such func- 
tionality, the NET Framework defines the ICollection, IList, and IDictionary 
interfaces. Each comes in both generic and nongeneric versions; however, the non- 
generic versions exist mostly for legacy support. 


Figure 7-1 showed the inheritance hierarchy for these interfaces. The easiest way to 
summarize them is as follows: 


IEnumerable<T> (and IEnumerable) 
Provides minimum functionality (enumeration only) 


ICollection<T> (and ICollection) 
Provides medium functionality (e.g., the Count property) 


IList<T>/IDictionary<K,V> and their nongeneric versions 
Provide maximum functionality (including “random” access by index/key) 


It's rare that you'll need to implement any of these interfaces. 
In nearly all cases when you need to write a collection class, 
you can instead subclass Collection<T> (see “Customizable 
Collections and Proxies” on page 351). LINQ provides yet 
another option that covers many scenarios. 


The generic and nongeneric versions differ in ways over and above what you might 
expect, particularly in the case of ICollection. The reasons for this are mostly his- 
torical: because generics came later, the generic interfaces were developed with the 
benefit of hindsight, leading to a different (and better) choice of members. For this 
reason, ICollection<T> does not extend ICollection, IList<T> does not extend 
IList, and IDictionary<TKey, TValue> does not extend IDictionary. Of course, 
a collection class itself is free to implement both versions of an interface if beneficial 
(which it often is). 


Another, subtler reason for IList<T> not extending IList is 
that casting to IList<T> would then return an interface with 
both Add(T) and Add(object) members. This would effec- 
tively defeat static type safety because you could call Add with 
an object of any type. 


This section covers ICollection<T>, IList<T>, and their nongeneric versions; 
“Dictionaries” on page 344 covers the dictionary interfaces. 
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There is no consistent rationale in the way the words collection 
and list are applied throughout the .NET Framework. For 
instance, because IList<T> is a more functional version of 
ICollection<T>, you might expect the class List<T> to be 
correspondingly more functional than the class 
Collection<T>. This is not the case. It’s best to consider the 
terms collection and list as broadly synonymous, except when 
a specific type is involved. 


ICollection<T> and ICollection 


ICollection<T> is the standard interface for countable collections of objects. It pro- 
vides the ability to determine the size of a collection (Count), determine whether an 
item exists in the collection (Contains), copy the collection into an array (ToArray), 
and determine whether the collection is read-only (IsReadOnly). For writable col- 
lections, you can also Add, Remove, and Clear items from the collection. And 
because it extends IEnumerable<T>, it can also be traversed via the foreach 
statement: 


public interface ICollection<T> : IEnumerable<T>, IEnumerable 


{ 
int Count { get; } 


bool Contains (T item); 
void CopyTo (T[] array, int arrayIndex); 
bool IsReadOnly { get; } 


void Add(T item); 
bool Remove (T item); 
void Clear(); 

} 


The nongeneric ICollection is similar in providing a countable collection, but it 
doesn't provide functionality for altering the list or checking for element 
membership: 


public interface ICollection : IEnumerable 


{ 
int Count { get; } 
bool IsSynchronized { get; } 
object SyncRoot { get; } 
void CopyTo (Array array, int index); 


I 
The nongeneric interface also defines properties to assist with synchronization 


(Chapter 14)—these were dumped in the generic version because thread safety is no 
longer considered intrinsic to the collection. 


Both interfaces are fairly straightforward to implement. If implementing a read-only 
ICollection<T>, the Add, Remove, and Clear methods should throw a Not 
SupportedException. 
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These interfaces are usually implemented in conjunction with either the IList or 
the IDictionary interface. 


IList<T> and IList 


IList<T> is the standard interface for collections indexable by position. In addition 
to the functionality inherited from ICollection<T> and IEnumerable<T>, it pro- 
vides the ability to read or write an element by position (via an indexer) and insert/ 
remove by position: 


public interface IList<T> : ICollection<T>, IEnumerable<T>, IEnumerable 
{ 

T this [int index] { get; set; } 

int IndexOf (T item); 

void Insert (int index, T item); 

void RemoveAt (int index); 


} 


The IndexOf methods perform a linear search on the list, returning -1 if the speci- 
fied item is not found. 


The nongeneric version of IList has more members because it inherits less from 
ICollection: 


public interface IList : ICollection, IEnumerable 
{ 
object this [int index] { get; set } 
bool IsFixedSize { get; } 
bool IsReadOnly { get; } 
int Add (object value); 
void Clear(); 
bool Contains (object value); 
int IndexOf (object value); 
void Insert (int index, object value); 
void Remove (object value); 
void RemoveAt (int index); 


} 


The Add method on the nongeneric IList interface returns an integer—this is the 
index of the newly added item. In contrast, the Add method on ICollection<T> has 
a void return type. 
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The general-purpose List<T> class is the quintessential implementation of both 
IList<T> and IList. C# arrays also implement both the generic and nongeneric 
ILists (although the methods that add or remove elements are hidden via explicit 
interface implementation and throw a NotSupportedException if called). 
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An ArgumentException is thrown if you try to access a multi- 
dimensional array via IList’s indexer. This is a trap when 
writing methods such as the following: 


public object FirstOrNull (IList list) 
{ 
if (list == null || list.Count == ©) return null; 
return list[0]; 
} 
This might appear bulletproof, but it will throw an exception 
if called with a multidimensional array. You can test for a mul- 
tidimensional array at runtime with this expression (more on 
this in Chapter 19): 


list.GetType().IsArray && list.GetType().GetArrayRank()>1 


IReadOnlyCollection<T> and IReadOnlyList<T> 


.NET Core also defines collection and list interfaces that expose just the members 
required for read-only operations: 


public interface IReadOnlyCollection<out T> : IEnumerable<T>, IEnumerable 


{ 
int Count { get; } 


} 


public interface IReadOnlyList<out T> : IReadOnlyCollection<T>, 
TEnumerable<T>, IEnumerable 


: T this[int index] { get; } 

} 
Because the type parameter for these interfaces is used only in output positions, it’s 
marked as covariant. This allows a list of cats, for instance, to be treated as a read- 
only list of animals. In contrast, T is not marked as covariant with ICollection<T> 
and IList<T>, because T is used in both input and output positions. 


These interfaces represent a read-only view of a collection or 
list; the underlying implementation might still be writable. 
Most of the writable (mutable) collections implement both the 
read-only and read/write interfaces. 


In addition to letting you work with collections covariantly, the read-only interfaces 
allow a class to publicly expose a read-only view of a private writable collection. We 
demonstrate this—along with a better solution—in “ReadOnlyCollection<T>” on 
page 356. 


IReadOnlyList<T> maps to the Windows Runtime type IVectorView<T>. 
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The Array Class 


The Array class is the implicit base class for all single and multidimensional arrays, 
and it is one of the most fundamental types implementing the standard collection 
interfaces. The Array class provides type unification, so a common set of methods is 
available to all arrays, regardless of their declaration or underlying element type. 


Because arrays are so fundamental, C# provides explicit syntax for their declaration 
and initialization, which we described in Chapter 2 and Chapter 3. When an array is 
declared using C#’s syntax, the CLR implicitly subtypes the Array class—synthesiz- 
ing a pseudotype appropriate to the array’s dimensions and element types. This 
pseudotype implements the typed generic collection interfaces, such as IList 
<string>. 


The CLR also treats array types specially upon construction, assigning them a con- 
tiguous space in memory. This makes indexing into arrays highly efficient, but pre- 
vents them from being resized later on. 


Array implements the collection interfaces up to IList<T> in both their generic and 
nongeneric forms. IList<T> itself is implemented explicitly, though, to keep Array’s 
public interface clean of methods such as Add or Remove, which throw an exception 
on fixed-length collections such as arrays. The Array class does actually offer a static 
Resize method, although this works by creating a new array and then copying over 
each element. As well as being inefficient, references to the array elsewhere in the 
program will still point to the original version. A better solution for resizable collec- 
tions is to use the List<T> class (described in the following section). 


An array can contain value-type or reference-type elements. Value-type elements 
are stored in place in the array, so an array of three long integers (each 8 bytes) will 
occupy 24 bytes of contiguous memory. A reference-type element, however, occu- 
pies only as much space in the array as a reference (4 bytes in a 32-bit environment 
or 8 bytes in a 64-bit environment). Figure 7-2 illustrates the effect, in memory, of 
the following program: 


StringBuilder[] builders = new StringBuilder [5]; 
builders [0] = new StringBuilder ("builder1"); 
builders [1] = new StringBuilder ("builder2"); 
builders [2] = new StringBuilder ("builder3"); 


long[] numbers = new long [3]; 
numbers [Q] = 12345; 
numbers [1] = 54321; 
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Figure 7-2. Arrays in memory 


Because Array is a class, arrays are always (themselves) reference types—regardless 
of the array’s element type. This means that the statement arrayB = arrayA results 
in two variables that reference the same array. Similarly, two distinct arrays will 
always fail an equality test, unless you employ a structural equality comparer, which 
compares every element of the array: 


object[] a1 = { "string", 123, true }; 
object[] a2 = { "string", 123, true }; 


Console.WriteLine (al == a2); // False 
Console.WriteLine (a1.Equals (a2)); // False 


IStructuralEquatable sel = a1; 
Console.WriteLine (se1.Equals (a2, 
StructuralComparisons.StructuralEqualityComparer ) ) ; // True 


Arrays can be duplicated by calling the Clone method: arrayB = arrayA.Clone(). 
However, this results in a shallow clone, meaning that only the memory represented 
by the array itself is copied. If the array contains value-type objects, the values them- 
selves are copied; if the array contains reference-type objects, just the references are 
copied (resulting in two arrays whose members reference the same objects). 
Figure 7-3 demonstrates the effect of adding the following code to our example: 


StringBuilder[] builders2 = builders; 
StringBuilder[] shallowClone = (StringBuilder[]) builders.Clone(); 
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Figure 7-3. Shallow-cloning an array 


To create a deep copy—for which reference type subobjects are duplicated—you 
must loop through the array and clone each element manually. The same rules apply 
to other .NET collection types. 


Although Array is designed primarily for use with 32-bit indexers, it also has limi- 
ted support for 64-bit indexers (allowing an array to theoretically address up to 2 
elements) via several methods that accept both Int32 and Int64 parameters. These 
overloads are useless in practice because the CLR does not permit any object— 
including arrays—to exceed two gigabytes in size (whether running on a 32- or 64- 
bit environment). 


Many of the methods on the Array class that you expect to be 
instance methods are in fact static methods. This is an odd 
design decision, and means that you should check for both 
static and instance methods when looking for a method on 
Array. 


Construction and Indexing 
The easiest way to create and index arrays is through C#’s language constructs: 
int[] myArray = { 1, 2, 3 }; 
int first = myArray [0]; 
int last = myArray [myArray.Length - 1]; 
Alternatively, you can instantiate an array dynamically by calling Array.Create 
Instance. This allows you to specify element type and rank (number of dimen- 


sions) at runtime as well as allowing nonzero-based arrays through specifying a 
lower bound. Nonzero-based arrays are not compatible with the .NET Common 
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Language Specification (CLS) and should not be exposed as public members in a 
library that might be consumed by a program written in F# or Visual Basic. 


The GetValue and SetValue methods let you access elements in a dynamically cre- 
ated array (they also work on ordinary arrays): 


// Create a string array 2 elements in length: 
Array a = Array.CreateInstance (typeof(string), 2); 


a.SetValue ("hi", 0); // > a[O] = "hi"; 
a.SetValue ("there", 1); // > a[1] = "there"; 
string s = (string) a.GetValue (0); // >s = a[Q]; 


// We can also cast to a C# array as follows: 
string[] cSharpArray = (string[]) a; 
string s2 = cSharpArray [0]; 


Zero-indexed arrays created dynamically can be cast to a C# array of a matching or 
compatible type (compatible by standard array-variance rules). For example, if 
Apple subclasses Fruit, Apple[] can be cast to Fruit[]. This leads to the issue of 
why object[] was not used as the unifying array type rather the Array class. The 
answer is that object[] is incompatible with both multidimensional and value-type 
arrays (and non-zero-based arrays). An int[] array cannot be cast to object[]. 
Hence, we require the Array class for full type unification. 


GetValue and SetValue also work on compiler-created arrays, and they are useful 
when writing methods that can deal with an array of any type and rank. For multi- 
dimensional arrays, they accept an array of indexers: 


public object GetValue (params int[] indices) 
public void SetValue (object value, params int[] indices) 


The following method prints the first element of any array, regardless of rank: 


void WriteFirstValue (Array a) 


{ 


Console.Write (a.Rank + 


" 


-dimensional; "); 


// The indexers array will automatically initialize to all zeros, so 
// passing it into GetValue or SetValue will get/set the zero-based 
// (i.e., first) element in the array. 


int[] indexers = new int[a.Rank]; 
Console.WriteLine ("First value is 


} 


+ a.GetValue (indexers)); 


void Demo() 

{ 
int[] oneD = { 1, 2, 3 }; 
int[,] twoD = { {5,6}, {8,9} }; 


WriteFirstValue (oneD); // 1-dimensional; first value is 1 
WriteFirstValue (twoD); // 2-dimensional; first value is 5 


} 
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For working with arrays of unknown type but known rank, 
generics provide an easier and more efficient solution: 


void WriteFirstValue<T> (T[] array) 
{ 


Console.WriteLine (array[0]); 


} 


SetValue throws an exception if the element is of an incompatible type for the 
array. 


When an array is instantiated, whether via language syntax or Array.Create 
Instance, its elements are automatically initialized. For arrays with reference-type 
elements, this means writing nulls; for arrays with value-type elements, this means 
calling the value-type’s default constructor (effectively zeroing the members). The 
Array class also provides this functionality on demand via the Clear method: 


public static void Clear (Array array, int index, int length); 


This method doesn't affect the size of the array. This is in contrast to the usual use of 
Clear (such as in ICollection<T>.Clear) whereby the collection is reduced to zero 
elements. 


Enumeration 


Arrays are easily enumerated with a foreach statement: 
int[] myArray = { 1, 2, 3 }; 
foreach (int val in myArray) 
Console.WriteLine (val); 
You can also enumerate using the static Array. ForEach method, defined as follows: 
public static void ForEach<T> (T[] array, Action<T> action); 
This uses an Action delegate, with this signature: 
public delegate void Action<T> (T obj); 


Here's the first example rewritten with Array. ForEach: 


Array.ForEach (new[] { 1, 2, 3 }, Console.WriteLine) ; 
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Length and Rank 


Array provides the following methods and properties for querying length and rank: 





public int GetLength (int dimension); 
public long GetLongLength (int dimension); 


public int Length { get; } 
public long LongLength’ { get; } 


public int GetLowerBound (int dimension); 
public int GetUpperBound (int dimension); 
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public int Rank { get; } // Returns number of dimensions in array 


GetLength and GetLongLength return the length for a given dimension (0 for a 
single-dimensional array), and Length and LongLength return the total number of 
elements in the array—all dimensions included. 


GetLowerBound and GetUpperBound are useful with nonzero-indexed arrays. Get 
UpperBound returns the same result as adding GetLowerBound to GetLength for any 
given dimension. 


Searching 


The Array class offers a range of methods for finding elements within a one- 
dimensional array: 


BinarySearch methods 
For rapidly searching a sorted array for a particular item 


IndexOf/LastIndex methods 
For searching unsorted arrays for a particular item 


Find/FindLast/FindIndex/FindLastIndex/FindALl/Exists/TrueForALL 
For searching unsorted arrays for item(s) that satisfy a given Predicate<T> 


None of the array-searching methods throws an exception if the specified value is 
not found. Instead, if an item is not found, methods returning an integer return -1 
(assuming a zero-indexed array), and methods returning a generic type return the 
type’s default value (e.g., © for an int, or null fora string). 


The binary search methods are fast, but they work only on sorted arrays and require 
that the elements be compared for order rather than simply equality. To this effect, 
the binary search methods can accept an IComparer or IComparer<T> object to arbi- 
trate on ordering decisions (see “Plugging in Equality and Order” on page 360). 
This must be consistent with any comparer used in originally sorting the array. If no 
comparer is provided, the type’s default ordering algorithm will be applied based on 
its implementation of IComparable/IComparable<T>. 


The IndexOf and LastIndexOf methods perform a simple enumeration over the 
array, returning the position of the first (or last) element that matches the given 
value. 


The predicate-based searching methods allow a method delegate or lambda expres- 
sion to arbitrate on whether a given element is a match.” A predicate is simply a del- 
egate accepting an object and returning true or false: 


public delegate bool Predicate<T> (T object); 


In the following example, we search an array of strings for a name containing the 


«>> 


letter “a”: 
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static void Main() 


{ 
string[] names = { "Rodney", "Jack", "Jill" }; 
string match = Array.Find (names, ContainsA); 
Console.WriteLine (match); // Jack 


} 


static bool ContainsA (string name) { return name.Contains ("a"); } 


Here's the same code shortened with an anonymous method: 


string[] names = { "Rodney", "Jack", "Jill" }; 
string match = Array.Find (names, delegate (string name) 
{ return name.Contains ("a"); } ); 


A lambda expression shortens it further: 


string[] names = { "Rodney", "Jack", "Jill" }; 
string match = Array.Find (names, n => n.Contains ("a")); // Jack 


FindALl returns an array of all items satisfying the predicate. In fact, it’s equivalent 
to Enumerable.Where in the System.Ling namespace, except that FindALl returns 
an array of matching items rather than an IEnumerable<T> of the same. 


Exists returns true if any array member satisfies the given predicate, and is equiva- 
lent to Any in System.Ling.Enumerable. 


TrueForAlLl returns true if all items satisfy the predicate, and is equivalent to All in 
System.Ling.EnumerabLle. 

Sorting 

Array has the following built-in sorting methods: 


// For sorting a single array: 


public static void Sort<T> (T[] array); 
public static void Sort (Array array); 


// For sorting a pair of arrays: 


public static void Sort<TKey,TValue> (TKey[] keys, TValue[] items); 
public static void Sort (Array keys, Array items); 


Each of these methods is additionally overloaded to also accept the following: 


int index // Starting index at which to begin sorting 
int length // Number of elements to sort 
IComparer<T> comparer // Object making ordering decisions 


Comparison<T> comparison // Delegate making ordering decisions 
The following illustrates the simplest use of Sort: 


int[] numbers = { 3, 2, 1 }; 
Array.Sort (numbers); // Array is now { 1, 2, 3 } 
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The methods accepting a pair of arrays work by rearranging the items of each array 
in tandem, basing the ordering decisions on the first array. In the next example, 
both the numbers and their corresponding words are sorted into numerical order: 


int[] numbers = { 3, 2, 1 }; 
string[] words = { "three", "two", "one" }; 
Array.Sort (numbers, words); 


// numbers array is now { 1, 2, 3 } 
// words array is now { "one", "two", "three" } 


Array.Sort requires that the elements in the array implement IComparable (see 
“Order Comparison” on page 306 in Chapter 6). This means that most built-in C# 
types (such as integers, as in the preceding example) can be sorted. If the elements 
are not intrinsically comparable or you want to override the default ordering, you 
must provide Sort with a custom comparison provider that reports on the relative 
position of two elements. There are ways to do this: 


¢ Via a helper object that implements IComparer/IComparer<T> (see “Plugging in 
Equality and Order” on page 360) 


e Via a Comparison delegate: 
public delegate int Comparison<T> (T x, T y); 


The Comparison delegate follows the same semantics as IComparer<T>.CompareTo: 
if x comes before y, a negative integer is returned; if x comes after y, a positive inte- 
ger is returned; if x and y have the same sorting position, 0 is returned. 


In the following example, we sort an array of integers such that the odd numbers 
come first: 


int[] numbers = { 1, 2, 3, 4, 5 }3 
Array.Sort (numbers, (x, y) => x%2==y%2?70:x%2==1? -1: 1); 


// numbers array is now { 1, 3, 5, 2, 4 } 


As an alternative to calling Sort, you can use LINQ’s OrderBy 
and ThenBy operators. Unlike Array.Sort, the LINQ opera- 
tors dont alter the original array, instead emitting the sorted 
result in a fresh IEnumerable<T> sequence. 


Reversing Elements 


The following Array methods reverse the order of all—or a portion of—elements in 
an array: 


public static void Reverse (Array array); 
public static void Reverse (Array array, int index, int length); 
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Copying 
Array provides four methods to perform shallow copying: Clone, CopyTo, Copy, and 


ConstrainedCopy. The former two are instance methods; the latter two are static 
methods. 


The Clone method returns a whole new (shallow-copied) array. The CopyTo and 
Copy methods copy a contiguous subset of the array. Copying a multidimensional 
rectangular array requires you to map the multidimensional index to a linear index. 
For example, the middle square (position[1,1]) in a 3 x 3 array is represented 
with the index 4, from the calculation: 1 x 3 + 1. The source and destination ranges 
can overlap without causing a problem. 


ConstrainedCopy performs an atomic operation: if all of the requested elements 
cannot be successfully copied (due to a type error, for instance), the operation is rol- 


led back. 


Array also provides an AsReadOnly method which returns a wrapper that prevents 
elements from being reassigned. 


Converting and Resizing 


Array.ConvertAlLl creates and returns a new array of element type TOutput, calling 
the supplied Converter delegate to copy over the elements. Converter is defined as 
follows: 


public delegate TOutput Converter<TInput,TOutput> (TInput input) 
The following converts an array of floats to an array of integers: 


float[] reals = { 1.3f, 1.5f, 1.8f }; 
int[] wholes = Array.ConvertAll (reals, r => Convert.ToInt32 (r)); 


// wholes array is { 1, 2, 2 } 


The Resize method works by creating a new array and copying over the elements, 
returning the new array via the reference parameter. However, any references to the 
original array in other objects will remain unchanged. 


The System.Linq namespace offers an additional buffet of 
extension methods suitable for array conversion. These meth- 
ods return an IEnumerable<T>, which you can convert back to 
an array via Enumerable’s ToArray method. 


Lists, Queues, Stacks, and Sets 


.NET Core provides a basic set of concrete collection classes that implement the 
interfaces described in this chapter. This section concentrates on the list-like collec- 
tions (versus the dictionary-like collections, which we cover in “Dictionaries” on 
page 344). As with the interfaces we discussed previously, you usually have a choice 
of generic or nongeneric versions of each type. In terms of flexibility and 
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performance, the generic classes win, making their nongeneric counterparts redun- 
dant except for backward compatibility. This differs from the situation with collec- 
tion interfaces, for which the nongeneric versions are still occasionally useful. 


Of the classes described in this section, the generic List class is the most commonly 
used. 


List<T> and ArrayList 


The generic List and nongeneric ArrayList classes provide a dynamically sized 
array of objects and are among the most commonly used of the collection classes. 
ArrayList implements IList, whereas List<T> implements both IList and 
IList<T> (and the read-only version, IReadOnlyList<T>). Unlike with arrays, all 
interfaces are implemented publicly, and methods such as Add and Remove are 
exposed and work as you would expect. 


Internally, List<T> and ArrayList work by maintaining an internal array of objects, 
replaced with a larger array upon reaching capacity. Appending elements is efficient 
(because there is usually a free slot at the end), but inserting elements can be slow 
(because all elements after the insertion point must be shifted to make a free slot), as 
can removing elements (especially near the start). As with arrays, searching is effi- 
cient if the BinarySearch method is used on a list that has been sorted, but it is 
otherwise inefficient because each item must be individually checked. 


List<T> is up to several times faster than ArrayList if T is a 
value type, because List<T> avoids the overhead of boxing 
and unboxing elements. 


List<T> and ArrayList provide constructors that accept an existing collection of 
elements: these copy each element from the existing collection into the new List<T> 
or ArrayList: 


public class List<T> : IList<T>, IReadOnlyList<T> 


{ 
public List (); 
public List (IEnumerable<T> collection); 
public List (int capacity); 


// Add+Insert 


public void Add (T item); 
public void AddRange (IEnumerable<T> collection); 
public void Insert (int index, T item); 


public void InsertRange (int index, IEnumerable<T> collection); 


// Remove 
public bool Remove (T item); 
public void RemoveAt (int index); 


public void RemoveRange (int index, int count); 
public int RemoveAll (Predicate<T> match); 


// Indexing 
public T this [int index] { get; set; } 
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public List<T> GetRange (int index, int count); 
public Enumerator<T> GetEnumerator(); 


// Exporting, copying, and converting: 

public T[] ToArray(); 

public void CopyTo (T[] array); 

public void CopyTo (T[] array, int arrayIndex); 

public void CopyTo (int index, T[] array, int arrayIndex, int count); 

public ReadOnlyCollection<T> AsReadOnly(); 

public List<TOutput> ConvertAll<TOutput> (Converter <T,TOutput> 
converter); 


// Other: 

public void Reverse(); // Reverses order of elements in list. 
public int Capacity { get; set; } // Forces expansion of internal array. 
public void TrimExcess(); // Trims internal array back to size. 
public void Clear(); // Removes all elements, so Count = 0. 


} 


public delegate TOutput Converter <TInput, TOutput> (TInput input); 


In addition to these members, List<T> provides instance versions of all of Array’s 
searching and sorting methods. 


The following code demonstrates List’s properties and methods (for examples of 


searching and sorting, see “The Array Class” on page 327): 


var words = new List<string>(); // New string-typed list 


words.Add ("melon"); 

words.Add ("avocado"); 

words.AddRange (new[] { "banana", "plum" } ); 

words.Insert (0, "lLemon"); // Insert at start 
words.InsertRange (0, new[] { "peach", "nashi" });  // Insert at start 


words.Remove ("melon"); 

words.RemoveAt (3); // Remove the 4th element 
words.RemoveRange (0, 2); // Remove first 2 elements 
// Remove all strings starting in 'n': 
words.RemoveALl (s => s.StartsWith ("n")); 


Console.WriteLine (words [0]); // first word 
Console.WriteLine (words [words.Count - 1]); // last word 
foreach (string s in words) Console.WriteLine (s); // all words 
List<string> subset = words.GetRange (1, 2); // 2nd->3rd words 
string[] wordsArray = words.ToArray(); // Creates a new typed array 


// Copy first two elements to the end of an existing array: 
string[] existing = new string [1000]; 
words.CopyTo (0, existing, 998, 2); 


List<string> upperCaseWords = words.ConvertALl (s => s.ToUpper()); 
List<int> Lengths = words.ConvertAlLl (s => s.Length); 








Lists, Queues, Stacks, andSets | 337 


fa) 
oI) 
oO 
is] 
= 
) 
=] 
7) 


The nongeneric ArrayList class requires clumsy casts—as the following example 
demonstrates: 


ArrayList al = new ArrayList(); 

al.Add ("hello"); 

string first = (string) al [0]; 

string[] strArr = (string[]) al.ToArray (typeof (string)); 
Such casts cannot be verified by the compiler; the following compiles successfully 
but then fails at runtime: 


int first = (int) al [0]; // Runtime exception 


An ArrayList is functionally similar to List<object>. Both 
are useful when you need a list of mixed-type elements that 
share no common base type (other than object). A possible 
advantage of choosing an ArrayList, in this case, would be if 
you need to deal with the list using reflection (Chapter 19). 
Reflection is easier with a nongeneric ArrayList than a 
List<object>. 


If you import the System.Ling namespace, you can convert an ArrayList to a 
generic List by calling Cast and then ToList: 


ArrayList al = new ArrayList(); 
al.AddRange (new[] { 1, 5, 9 } ); 
List<int> list = al.Cast<int>().ToList(); 


Cast and ToList are extension methods in the System. Ling. Enumerable class. 


LinkedList<T> 


LinkedList<T> is a generic doubly linked list (see Figure 7-4). A doubly linked list 
is a chain of nodes in which each references the node before, the node after, and the 
actual element. Its main benefit is that an element can always be inserted efficiently 
anywhere in the list because it just involves creating a new node and updating a few 
references. However, finding where to insert the node in the first place can be slow 
because there's no intrinsic mechanism to index directly into a linked list; each node 
must be traversed, and binary-chop searches are not possible. 
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Figure 7-4. LinkedList<T> 


LinkedList<T> implements IEnumerable<T> and ICollection<T> (and their non- 
generic versions), but not IList<T> because access by index is not supported. List 
nodes are implemented via the following class: 


public 
{ 


sealed class LinkedListNode<T> 


public LinkedList<T> List { get; } 

public LinkedListNode<T> Next { get; } 
public LinkedListNode<T> Previous { get; } 
public T Value { get; set; } 


} 


When adding a node, you can specify its position either relative to another node or 
at the start/end of the list. LinkedList<T> provides the following methods for this: 


public 
public 


public 
public 


public 
public 


public 
public 


void AddFirst(LinkedListNode<T> node); 
LinkedListNode<T> AddFirst (T value); 


void AddLast (LinkedListNode<T> node); 
LinkedListNode<T> AddLast (T value); 


void AddAfter (LinkedListNode<T> node, LinkedListNode<T> newNode); 
LinkedListNode<T> AddAfter (LinkedListNode<T> node, T value); 
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void AddBefore (LinkedListNode<T> node, LinkedListNode<T> newNode); 
LinkedListNode<T> AddBefore (LinkedListNode<T> node, T value); 


Similar methods are provided to remove elements: 


public 


public 
public 


void Clear(); 


void RemoveFirst(); 
void RemoveLast(); 
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public bool Remove (T value); 
public void Remove (LinkedListNode<T> node); 


LinkedList<T> has internal fields to keep track of the number of elements in the list 
as well as the head and tail of the list. These are exposed in the following public 
properties: 


public int Count { get; } // Fast 
public LinkedListNode<T> First { get; } // Fast 
public LinkedListNode<T> Last { get; } // Fast 


LinkedList<T> also supports the following searching methods (each requiring that 
the list be internally enumerated): 


public bool Contains (T value); 
public LinkedListNode<T> Find (T value); 
public LinkedListNode<T> FindLast (T value); 


Finally, LinkedList<T> supports copying to an array for indexed processing and 
obtaining an enumerator to support the foreach statement: 


public void CopyTo (T[] array, int index); 
public Enumerator<T> GetEnumerator(); 


Here's a demonstration on the use of LinkedList<string>: 


var tune = new LinkedList<string>(); 


tune.AddFirst ("do"); // do 

tune.AddLast ("so"); // do - so 

tune.AddAfter (tune.First, "re"); // do - re - so 
tune.AddAfter (tune.First.Next, "mi"); // do - re - mi - so 
tune.AddBefore (tune.Last, "fa"); // do - re - mi - fa - so 
tune.RemoveFirst(); // re - mi - fa - so 
tune.RemoveLast(); // re - mi - fa 


LinkedListNode<string> miNode = tune.Find ("mi"); 
tune.Remove (miNode) ; // re - fa 
tune.AddFirst (miNode); // mi - re - fa 


foreach (string s in tune) Console.WriteLine (s); 


Queue<T> and Queue 


Queue<T> and Queue are first-in, first-out (FIFO) data structures, providing meth- 
ods to Enqueue (add an item to the tail of the queue) and Dequeue (retrieve and 
remove the item at the head of the queue). A Peek method is also provided to return 
the element at the head of the queue without removing it, and there is a Count prop- 
erty (useful in checking that elements are present before dequeuing). 


Although queues are enumerable, they do not implement IList<T>/IList, because 
members cannot be accessed directly by index. A ToArray method is provided, 
however, for copying the elements to an array from which they can be randomly 
accessed: 





340 | Chapter 7: Collections 


public class Queue<T> : IEnumerable<T>, ICollection, IEnumerable 
{ 
public Queue(); 
public Queue (IEnumerable<T> collection) ; // Copies existing elements 
public Queue (int capacity); // To lessen auto-resizing 
public void Clear(); 
public bool Contains (T item); 
public void CopyTo (T[] array, int arrayIndex); 
public int Count { get; } 
public T Dequeue(); 
public void Enqueue (T item); 
public Enumerator<T> GetEnumerator(); // To support foreach 
public T Peek(); 
public T[] ToArray(); 
public void TrimExcess(); 


} 


The following is an example of using Queue<int>: 


var q = new Queue<int>(); 
q.Enqueue (10); 
q.Enqueue (20); 


int[] data = q.ToArray(); // Exports to an array 
Console.WriteLine (q.Count); dP" 2" 
Console.WriteLine (q.Peek()); // "10" 


Console.WriteLine (q.Dequeue()); // "10" 
Console.WriteLine (q.Dequeue()); // "20" 
Console.WriteLine (q.Dequeue()); // Throws an exception (queue empty) 


Queues are implemented internally using an array that’s resized as required—much 
like the generic List class. The queue maintains indexes that point directly to the 
head and tail elements; therefore, enqueuing and dequeuing are extremely quick 
operations (except when an internal resize is required). 


Stack<T> and Stack 


Stack<T> and Stack are last-in, first-out (LIFO) data structures, providing methods 
to Push (add an item to the top of the stack) and Pop (retrieve and remove an ele- 
ment from the top of the stack). A nondestructive Peek method is also provided, as 
is a Count property and a ToArray method for exporting the data for random access: 


public class Stack<T> : IEnumerable<T>, ICollection, IEnumerable 
{ 
public Stack(); 
public Stack (IEnumerable<T> collection) ; // Copies existing elements 
public Stack (int capacity); // Lessens auto-resizing 
public void Clear(); 
public bool Contains (T item); 
public void CopyTo (T[] array, int arrayIndex); 
public int Count { get; } 
public Enumerator<T> GetEnumerator(); // To support foreach 
public T Peek(); 
public T Pop(); 
public void Push (T item); 
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public T[] ToArray(); 
public void TrimExcess(); 


} 


The following example demonstrates Stack<int>: 


var s = new Stack<int>(); 


s.Push (1); // Stack = 1 
s.Push (2); // Stack = 1,2 
s.Push (3); // Stack = 1,2,3 
Console.WriteLine (s.Count); // Prints 3 

Console.WriteLine (s.Peek()); // Prints 3, Stack = 1,2,3 
Console.WriteLine (s.Pop()); // Prints 3, Stack = 1,2 
Console.WriteLine (s.Pop()); // Prints 2, Stack = 1 
Console.WriteLine (s.Pop()); // Prints 1, Stack = <empty> 
Console.WriteLine (s.Pop()); // Throws exception 


Stacks are implemented internally with an array that’s resized as required, as with 
Queue<T> and List<T>. 


BitArray 


A BitArray is a dynamically sized collection of compacted bool values. It is more 
memory efficient than both a simple array of bool and a generic List of bool 
because it uses only one bit for each value, whereas the bool type otherwise occu- 
pies one byte for each value. 


BitArray’s indexer reads and writes individual bits: 


var bits = new BitArray(2); 
bits[1] = true; 


There are four bitwise operator methods (And, Or, Xor, and Not). All but the last 
accept another BitArray: 


bits.Xor (bits); // Bitwise exclusive-OR bits with itself 
Console.WriteLine (bits[1]); // False 


HashSet<T> and SortedSet<T> 


HashSet<T> and SortedSet<T> are generic collections new to Framework 3.5 and 
4.0, respectively. Both have the following distinguishing features: 
¢ Their Contains methods execute quickly using a hash-based lookup. 


e They do not store duplicate elements and silently ignore requests to add 
duplicates. 


e You cannot access an element by position. 


SortedSet<T> keeps elements in order, whereas HashSet<T> does not. 
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The commonality of these types is captured by the interface 
TSet<T>. 


For historical reasons, HashSet<T> resides in System.Core.dll 
(whereas SortedSet<T> and ISet<T> reside in System.dll). 


HashSet<T> is implemented with a hashtable that stores just keys; SortedSet<T> is 
implemented with a red/black tree. 


Both collections implement ICollection<T> and offer methods that you would 
expect, such as Contains, Add, and Remove. In addition, there’s a predicate-based 
removal method called Removewhere. 


The following constructs a HashSet<char> from an existing collection, tests for 
membership, and then enumerates the collection (notice the absence of duplicates): 


var Letters = new HashSet<char> ("the quick brown fox"); 


Console.WriteLine (letters.Contains ('t')); // true 
Console.WriteLine (letters.Contains ('j')); // false 


foreach (char c in letters) Console.Write (c); // the quickbrownfx 


(The reason we can pass a string into HashSet<char>’s constructor is because 
string implements IEnumerable<char>.) 


The really interesting methods are the set operations. The following set operations 
are destructive in that they modify the set: 


public void UnionWith (IEnumerable<T> other); // Adds 

public void IntersectWith (IEnumerable<T> other);  // Removes 
public void ExceptWith (IEnumerable<T> other);  // Removes 
public void SymmetricExceptWith (IEnumerable<T> other); // Removes 


whereas the following methods simply query the set and so are nondestructive: 


public bool IsSubsetOf (IEnumerable<T> other); 
public bool IsProperSubsetOf (IEnumerable<T> other); 
public bool IsSupersetOf (IEnumerable<T> other); 
public bool IsProperSupersetOf (IEnumerable<T> other); 
public bool Overlaps (IEnumerable<T> other); 
public bool SetEquals (IEnumerable<T> other); 
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UnionWith adds all the elements in the second set to the original set (excluding 
duplicates). IntersectWith removes the elements that are not in both sets. We can 
extract all of the vowels from our set of characters as follows: 





var Letters = new HashSet<char> ("the quick brown fox"); 
letters.IntersectWith ("aeiou"); 
foreach (char c in letters) Console.Write (c); // euio 


ExceptWith removes the specified elements from the source set. Here, we strip all 
vowels from the set: 
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var letters = new HashSet<char> ("the quick brown fox"); 
letters.ExceptwWith ("aeiou"); 
foreach (char c in letters) Console.Write (c); // th qckbrwnfx 


SymmetricExceptwWith removes all but the elements that are unique to one set or the 
other: 


var Letters = new HashSet<char> ("the quick brown fox"); 
letters.SymmetricExceptWith ("the Lazy brown fox"); 
foreach (char c in letters) Console.Write (c); // quicklazy 


Note that because HashSet<T> and SortedSet<T> implement IEnumerable<T>, you 
can use another type of set (or collection) as the argument to any of the set opera- 
tion methods. 


SortedSet<T> offers all the members of HashSet<T>, plus the following: 


public virtual SortedSet<T> GetViewBetween (T lowerValue, T upperValue) 
public IEnumerable<T> Reverse() 

public T Min { get; } 

public T Max { get; } 


SortedSet<T> also accepts an optional IComparer<T> in its constructor (rather than 
an equality comparer). 


Here's an example of loading the same letters into a SortedSet<char>: 


var letters = new SortedSet<char> ("the quick brown fox"); 
foreach (char c in letters) Console.Write (c); // bcefhiknogrtuwx 


Following on from this, we can obtain the letters in the set between f and j as 
follows: 


foreach (char c in letters.GetViewBetween ('f', 'j')) 
Console.Write (c); // fhi 


Dictionaries 


A dictionary is a collection in which each element is a key/value pair. Dictionaries 
are most commonly used for lookups and sorted lists. 


The Framework defines a standard protocol for dictionaries, via the interfaces 
IDictionary and IDictionary <TKey, TValue> as well as a set of general-purpose 
dictionary classes. The classes each differ in the following regard: 

e Whether or not items are stored in sorted sequence 

e Whether or not items can be accessed by position (index) as well as by key 

¢ Whether generic or nongeneric 

¢ Whether it’s fast or slow to retrieve items by key from a large dictionary 


Table 7-1 summarizes each of the dictionary classes and how they differ in these 
respects. The performance times are in milliseconds and based on performing 
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50,000 operations on a dictionary with integer keys and values on a 1.5 GHz PC. 
(The differences in performance between generic and nongeneric counterparts 
using the same underlying collection structure are due to boxing, and show up only 
with value-type elements.) 


Table 7-1. Dictionary classes 


Internal Retrieve © Memory Speed: Speed: Speed: 
structure  byindex? overhead random sequential retrieval 


(avg. bytes insertion insertion by key 





per item) 

Unsorted 
Dictionary <K,V> Hashtable No 22 30 30 20 
Hashtable Hashtable No 38 50 50 30 
ListDictionary Linked list No 36 50,000 50,000 50,000 
OrderedDictionary Hashtable Yes 59 70 70 40 

+ array 
Sorted 
SortedDictionary Red/black = No 20 130 100 120 
<K ,V> tree 
SortedList <K,V> 2xArray Yes 2 3,300 30 40 
SortedList 2xArray Yes 27 4,500 100 180 





In Big-O notation, retrieval time by key is as follows: 


e O(1) for Hashtable, Dictionary, and OrderedDictionary 
¢ O(log n) for SortedDictionary and SortedList 


e O(n) for ListDictionary (and nondictionary types such as List<T>) 


n is the number of elements in the collection. 


IDictionary<TKey,TValue> 
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IDictionary<TKey,TValue> defines the standard protocol for all key/value-based 
collections. It extends ICollection<T> by adding methods and properties to access 
elements based on a key of arbitrary type: 





public interface IDictionary <TKey, TValue> : 
ICollection <KeyValuePair <TKey, TValue>>, IEnumerable 
{ 
bool ContainsKey (TKey key); 
bool TryGetValue (TKey key, out TValue value); 


void Add (TKey key, TValue value); 
bool Remove (TKey key); 
TValue this [TKey key] { get; set; } // Main indexer - by key 
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ICollection <TKey> Keys { get; } // Returns just keys 
ICollection <TValue> Values { get; } // Returns just values 


From Framework 4.5, theres also an interface called 
IReadOnlyDictionary<TKey, TValue>, which defines the read- 
only subset of dictionary members. This maps to the Win- 
dows Runtime type IMapView<K,V> and was introduced 
primarily for that reason. 


To add an item to a dictionary, you either call Add or use the index’s set accessor— 
the latter adds an item to the dictionary if the key is not already present (or updates 
the item if it is present). Duplicate keys are forbidden in all dictionary implementa- 
tions, so calling Add twice with the same key throws an exception. 


To retrieve an item from a dictionary, use either the indexer or the TryGetValue 
method. If the key doesn’t exist, the indexer throws an exception, whereas 
TryGetValue returns false. You can test for membership explicitly by calling 
Contains Key; however, this incurs the cost of two lookups if you subsequently 
retrieve the item. 


Enumerating directly over an IDictionary<TKey,TValue> returns a sequence of 
KeyValuePair structs: 


public struct KeyValuePair <TKey, TValue> 


{ 
public TKey Key { get; } 
public TValue Value { get; } 
} 


You can enumerate over just the keys or values via the dictionary’s Keys/Values 
properties. 


We demonstrate the use of this interface with the generic Dictionary class in the 
following section. 


IDictionary 


The nongeneric IDictionary interface is the same in principle as IDictionary 
<TKey,TValue>, apart from two important functional differences. It's important to 
be aware of these differences, because IDictionary appears in legacy code (includ- 
ing the NET Framework itself in places): 


¢ Retrieving a nonexistent key via the indexer returns null (rather than throwing 
an exception). 
¢ Contains tests for membership rather than ContainsKey. 


Enumerating over a nongeneric IDictionary returns a sequence of Dictionary 
Entry structs: 
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public struct DictionaryEntry 


{ 
public object Key { get; set; } 
public object Value { get; set; } 


} 
Dictionary<TKey,TValue> and Hashtable 


The generic Dictionary class is one of the most commonly used collections (along 
with the List<T> collection). It uses a hashtable data structure to store keys and val- 
ues, and it is fast and efficient. 


The nongeneric version of Dictionary<TKey,TValue> is 
called Hashtable; there is no nongeneric class called 
Dictionary. When we refer simply to Dictionary, we mean 
the generic Dictionary<TKey, TValue> class. 


Dictionary implements both the generic and nongeneric IDictionary interfaces, 
the generic IDictionary being exposed publicly. Dictionary is, in fact, a “textbook” 
implementation of the generic IDictionary. 


Here’s how to use it: 


var d = new Dictionary<string, int>(); 


d.Add("One", 1); 

d["Two"] = 2; // adds to dictionary because "two" not already present 
d["Two"] = 22; // updates dictionary because "two" is now present 
d["Three"] = 3; 


Console.WriteLine (d["Two"]); // Prints "22" 
Console.WriteLine (d.ContainsKey ("One")); // true (fast operation) 
Console.WriteLine (d.ContainsValue (3)); // true (slow operation) 


int val = 0; 
if (!d.TryGetValue ("onE", out val)) 
Console.WriteLine ("No val"); // "No val" (case sensitive) 


// Three different ways to enumerate the dictionary: 





(@) 
Q 
foreach (KeyValuePair<string, int> kv in d) // One; 1 Oo 
Console.WriteLine (kv.Key + "3; " + kv.Value); // Two; 22 2 
// Three; 3 2 
7) 
foreach (string s in d.Keys) Console.Write (s); // OneTwoThree 
Console.WriteLine(); 
foreach (int i in d.Values) Console.Write (i); // 1223 


Its underlying hashtable works by converting each element's key into an integer 
hashcode—a pseudo-unique value—and then applying an algorithm to convert the 
hashcode into a hash key. This hash key is used internally to determine which 
“bucket” an entry belongs to. If the bucket contains more than one value, a linear 
search is performed on the bucket. A good hash function does not strive to return 
strictly unique hashcodes (which would usually be impossible); it strives to return 
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hashcodes that are evenly distributed across the 32-bit integer space. This avoids the 
scenario of ending up with a few very large (and inefficient) buckets. 


A dictionary can work with keys of any type, provided it’s able to determine equality 
between keys and obtain hashcodes. By default, equality is determined via the key’s 
object.Equals method, and the pseudo-unique hashcode is obtained via the key’s 
GetHashCode method. You can change this behavior either by overriding these 
methods or by providing an IEqualityComparer object when constructing the dic- 
tionary. A common application of this is to specify a case-insensitive equality com- 
parer when using string keys: 


var d = new Dictionary<string, int> (StringComparer.OrdinalIgnoreCase) ; 


We discuss this further in “Plugging in Equality and Order” on page 360. 


As with many other types of collections, you can improve the performance of a dic- 
tionary slightly by specifying the collection’s expected size in the constructor, avoid- 
ing or lessening the need for internal resizing operations. 


The nongeneric version is named Hashtable and is functionally similar, apart from 
differences stemming from it exposing the nongeneric IDictionary interface dis- 
cussed previously. 


The downside to Dictionary and Hashtable is that the items are not sorted. Fur- 
thermore, the original order in which the items were added is not retained. As with 
all dictionaries, duplicate keys are not allowed. 


When the generic collections were introduced in Framework 
2.0, the CLR team chose to name them according to what they 
represent (Dictionary, List) rather than how they are inter- 
nally implemented (Hashtable, ArrayList). Although this is 
good because it gives them the freedom to later change the 
implementation, it also means that the performance contract 
(often the most important criterion in choosing one kind of 
collection over another) is no longer captured in the name. 


OrderedDictionary 


An OrderedDictionary is a nongeneric dictionary that maintains elements in the 
same order that they were added. With an OrderedDictionary, you can access ele- 
ments both by index and by key. 


An OrderedDictionary is not a sorted dictionary. 


An OrderedDictionary is a combination of a Hashtable and an ArrayList. This 
means that it has all the functionality of a Hashtable, plus functions such as Remove 
At, and an integer indexer. It also exposes Keys and Values properties that return 
elements in their original order. 
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This class was introduced in .NET 2.0, yet peculiarly, there’s no generic version. 


ListDictionary and HybridDictionary 


ListDictionary uses a singly linked list to store the underlying data. It doesn’t pro- 
vide sorting, although it does preserve the original entry order of the items. List 
Dictionary is extremely slow with large lists. Its only real “claim to fame” is its 
efficiency with very small lists (fewer than 10 items). 


HybridDictionary is a ListDictionary that automatically converts to a Hashtable 
upon reaching a certain size, to address ListDictionary’s problems with perfor- 
mance. The idea is to get a low memory footprint when the dictionary is small, and 
good performance when the dictionary is large. However, given the overhead in 
converting from one to the other—and the fact that a Dictionary is not excessively 
heavy or slow in either scenario—you wouldnt suffer unreasonably by using a 
Dictionary to begin with. 


Both classes come only in nongeneric form. 


Sorted Dictionaries 


The Framework provides two dictionary classes internally structured such that their 
content is always sorted by key: 


e SortedDictionary<TKey, TValue> 


e SortedList<TKey, TValue>! 


(In this section, we abbreviate <TKey , TValue> to <,>.) 


SortedDictionary<,> uses a red/black tree: a data structure designed to perform 
consistently well in any insertion or retrieval scenario. 


SortedList<,> is implemented internally with an ordered array pair, providing fast 
retrieval (via a binary-chop search) but poor insertion performance (because exist- 
ing values need to be shifted to make room for a new entry). 


SortedDictionary<,> is much faster than SortedList<,> at inserting elements in a 
random sequence (particularly with large lists). SortedList<,>, however, has an 
extra ability: to access items by index as well as by key. With a sorted list, you can go 
directly to the nth element in the sorting sequence (via the indexer on the Keys/ 
Values properties). To do the same with a SortedDictionary<,>, you must man- 
ually enumerate over n items. (Alternatively, you could write a class that combines a 
sorted dictionary with a list class.) 


None of the three collections allows duplicate keys (as is the case with all 
dictionaries). 





1 There’s also a functionally identical nongeneric version of this called SortedList. 
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The following example uses reflection to load all of the methods defined in 
System.Object into a sorted list keyed by name, and then enumerates their keys 
and values: 


// MethodInfo is in the System.Reflection namespace 
var sorted = new SortedList <string, MethodInfo>(); 


foreach (MethodInfo m in typeof (object) .GetMethods()) 
sorted [m.Name] = m; 


foreach (string name in sorted.Keys) 
Console.WriteLine (name); 


foreach (MethodInfo m in sorted.Values) 
Console.WriteLine (m.Name + " returns a " + m.ReturnType); 


Here’s the result of the first enumeration: 


Equals 
GetHashCode 
GetType 
ReferenceEquals 
ToString 


Here's the result of the second enumeration: 


Equals returns a System.Boolean 
GetHashCode returns a System. Int32 
GetType returns a System.Type 
ReferenceEquals returns a System.Boolean 
ToString returns a System.String 


Notice that we populated the dictionary through its indexer. If we instead used the 
Add method, it would throw an exception because the object class upon which 
were reflecting overloads the Equals method, and you can't add the same key twice 
to a dictionary. By using the indexer, the later entry overwrites the earlier entry, pre- 
venting this error. 


You can store multiple members of the same key by making 
each value element a list: 


SortedList <string, List<MethodInfo>> 


Extending our example, the following retrieves the MethodInfo whose key is 
"GetHashCode", just as with an ordinary dictionary: 


Console.WriteLine (sorted ["GetHashCode"]); // Int32 GetHashCode() 


So far, everything we've done would also work with a SortedDictionary<,>. The 
following two lines, however, which retrieve the last key and value, work only with a 
sorted list: 


Console.WriteLine (sorted.Keys [sorted.Count - 1]); // ToString 
Console.WriteLine (sorted.Values[sorted.Count - 1].IsVirtual); // True 
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Customizable Collections and Proxies 


The collection classes discussed in previous sections are convenient in that you can 
directly instantiate them, but they don’t allow you to control what happens when an 
item is added to or removed from the collection. With strongly typed collections in 
an application, you sometimes need this control; for instance: 


e To fire an event when an item is added or removed 
e To update properties because of the added or removed item 


¢ To detect an “illegal” add/remove operation and throw an exception (for exam- 
ple, if the operation violates a business rule) 


The .NET Framework provides collection classes for this exact purpose, in the 
System.Collections.ObjectModel namespace. These are essentially proxies or 
wrappers that implement IList<T> or IDictionary<,> by forwarding the methods 
through to an underlying collection. Each Add, Remove, or Clear operation is routed 
via a virtual method that acts as a “gateway” when overridden. 


Customizable collection classes are commonly used for publicly exposed collec- 
tions; for instance, a collection of controls exposed publicly on a System 
Windows. Form class. 


Collection<T> and CollectionBase 
The Collection<T> class is a customizable wrapper for List<T>. 


As well as implementing IList<T> and IList, it defines four additional virtual 
methods and a protected property, as follows: 


public class Collection<T> : 
IList<T>, ICollection<T>, IEnumerable<T>, IList, ICollection, IEnumerable 


{ 
ee 


protected virtual void ClearItems(); 

protected virtual void InsertItem (int index, T item); 
protected virtual void RemoveItem (int index); 
protected virtual void SetItem (int index, T item); 


protected IList<T> Items { get; } 
} 


The virtual methods provide the gateway by which you can “hook in” to change or 
enhance the list’s normal behavior. The protected Items property allows the imple- 
menter to directly access the “inner list’—this is used to make changes internally 
without the virtual methods firing. 


The virtual methods need not be overridden; they can be left alone until there’s a 
requirement to alter the list’s default behavior. The following example demonstrates 
the typical “skeleton” use of Collection<T>: 
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public class Animal 


{ 
public string Name; 
public int Popularity; 


public Animal (string name, int popularity) 
{ 
Name = name; Popularity = popularity; 
} 
} 


public class AnimalCollection : Collection <Animal> 


{ 
// AnimalCollection is already a fully functioning list of animals. 
// No extra code is required. 


} 


public class Zoo // The class that will expose AnimalCollection. 
{ // This would typically have additional members. 


public readonly AnimalCollection Animals = new AnimalCollection(); 


} 


class Program 
{ 
static void Main() 
{ 
Zoo zoo = new Zoo(); 
zoo.Animals.Add (new Animal ("Kangaroo", 10)); 
zoo.Animals.Add (new Animal ("Mr Sea Lion", 20)); 
foreach (Animal a in zoo.Animals) Console.WriteLine (a.Name) ; 
} 
$ 


As it stands, AnimalCollection is no more functional than a simple List<Animal>; 
its role is to provide a base for future extension. To illustrate, let's now add a Zoo 
property to Animal so that it can reference the Zoo in which it lives and override 
each of the virtual methods in Collection<Animal> to maintain that property 
automatically: 


public class Animal 
{ 
public string Name; 
public int Popularity; 
public Zoo Zoo { get; internal set; } 
public Animal(string name, int popularity) 
{ 
Name = name; Popularity = popularity; 
} 
} 


public class AnimalCollection : Collection <Animal> 


{ 
Zoo z00o; 
public AnimalCollection (Zoo zoo) { this.zoo = zoo; } 
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protected override void InsertItem (int index, Animal item) 


{ 


base.InsertItem (index, item); 
item.Zoo = zoo; 


} 


protected override void SetItem (int index, Animal item) 


{ 


base.SetItem (index, item); 
item.Zoo = zoo; 


} 


protected override void RemoveItem (int index) 


{ 
this [index].Zoo = null; 
base.RemoveItem (index); 


} 


protected override void ClearItems() 


{ 


foreach (Animal a in this) a.Zoo = null; 
base.ClearItems(); 


} 
} 


public class Zoo 


{ 


public readonly AnimalCollection Animals; 
public Zoo() { Animals = new AnimalCollection (this); } 


} 


Collection<T> also has a constructor accepting an existing IList<T>. Unlike with 
other collection classes, the supplied list is proxied rather than copied, meaning that 
subsequent changes will be reflected in the wrapping Collection<T> (although 
without Collection<T>’s virtual methods firing). Conversely, changes made via the 
Collection<T> will change the underlying list. 


CollectionBase 


CollectionBase is the nongeneric version of Collection<T> introduced in Frame- 
work 1.0. This provides most of the same features as Collection<T>, but is clumsier 
to use. Instead of the template methods InsertItem, RemoveItem, SetItem, and 
ClearItem, CollectionBase has “hook” methods that double the number of meth- 
ods required: OnInsert, OnInsertComplete, OnSet, OnSetComplete, OnRemove, On 
RemoveComplete, OnClear, and OnClearComplete. Because CollectionBase is 
nongeneric, you must also implement typed methods when subclassing it—at a 
minimum, a typed indexer and Add method. 


KeyedCollection<TKey,Tltem> and DictionaryBase 


KeyedCollection<TKey,TItem> subclasses Collection<TItem>. It both adds and 
subtracts functionality. What it adds is the ability to access items by key, much like 
with a dictionary. What it subtracts is the ability to proxy your own inner list. 
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A keyed collection has some resemblance to an OrderedDictionary in that it com- 
bines a linear list with a hashtable. However, unlike OrderedDictionary, it doesn't 
implement IDictionary and doesn't support the concept of a key/value pair. Keys 
are obtained instead from the items themselves, via the abstract GetKeyForItem 
method. This means enumerating a keyed collection is just like enumerating an 
ordinary list. 


You can best think of KeyedCollection<TKey,TItem> as Collection<TItem> plus 
fast lookup by key. 


Because it subclasses Collection<>, a keyed collection inherits all of 
Collection<>’s functionality, except for the ability to specify an existing list in con- 
struction. The additional members it defines are as follows: 


public abstract class KeyedCollection <TKey, TItem> : Collection <TItem> 


Paws 


protected abstract TKey GetKeyForItem(TItem item); 
protected void ChangeItemKey(TItem item, TKey newKey); 


// Fast lookup by key - this is in addition to lookup by index. 
public TItem this[TKey key] { get; } 


protected IDictionary<TKey, TItem> Dictionary { get; } 
} 


GetKeyForItem is what the implementer overrides to obtain an item’s key from the 
underlying object. The ChangeItemKey method must be called if the item’s key prop- 
erty changes, in order to update the internal dictionary. The Dictionary property 
returns the internal dictionary used to implement the lookup, which is created when 
the first item is added. This behavior can be changed by specifying a creation thres- 
hold in the constructor, delaying the internal dictionary from being created until the 
threshold is reached (in the interim, a linear search is performed if an item is 
requested by key). A good reason not to specify a creation threshold is that having a 
valid dictionary can be useful in obtaining an ICollection<> of keys, via the 
Dictionary’s Keys property. This collection can then be passed on to a public 


property. 


The most common use for KeyedCollection<,> is in providing a collection of 
items accessible both by index and by name. To demonstrate this, let’s revisit the 
zoo, this time implementing AnimalCollection as a KeyedCollection 
<string,Animal>: 


public class Animal 
{ 
string name; 
public string Name 
{ 
get { return name; } 
set { 
if (Zoo != null) Zoo.Animals.NotifyNameChange (this, value); 
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name = value; 


} 


} 
public int Popularity; 
public Zoo Zoo { get; internal set; } 


public Animal (string name, int popularity) 


‘ Name = name; Popularity = popularity; 

} 
} 
public class AnimalCollection : KeyedCollection <string, Animal> 
{ 

Zoo zoo; 


public AnimalCollection (Zoo zoo) { this.zoo = zoo; } 


internal void NotifyNameChange (Animal a, string newName) => 
this.ChangeItemKey (a, newName); 


protected override string GetKeyForItem (Animal item) => item.Name; 


// The following methods would be implemented as in the previous example 
protected override void InsertItem (int index, Animal item)... 

protected override void SetItem (int index, Animal item)... 

protected override void RemoveItem (int index)... 

protected override void ClearItems()... 


} 


public class Zoo 


{ 


public readonly AnimalCollection Animals; 
public Zoo() { Animals = new AnimalCollection (this); } 


} 


The following code demonstrates its use: 


Zoo zoo = new Zoo(); 
zoo.Animals.Add (new Animal ("Kangaroo", 10)); 





zoo.Animals.Add (new Animal ("Mr Sea Lion", 20)); is 

Console.WriteLine (zoo.Animals [0].Popularity); // 10 = 

Console.WriteLine (zoo.Animals ["Mr Sea Lion"].Popularity);  // 20 is] 

zoo.Animals ["Kangaroo"].Name = "Mr Roo"; ° 

Console.WriteLine (zoo.Animals ["Mr Roo"].Popularity); // 10 7A 
DictionaryBase 


The nongeneric version of KeyedCollection is called DictionaryBase. This legacy 
class takes a very different approach in that it implements IDictionary and uses 
clumsy hook methods like CollectionBase: OnInsert, OnInsertComplete, OnSet, 
OnSetCompLete, OnRemove, OnRemoveCompLete, OnClear, and OnClearComplete (and 
additionally, OnGet). The primary advantage of implementing IDictionary over 
taking the KeyedCollection approach is that you don’t need to subclass it in order 
to obtain keys. But since the very purpose of DictionaryBase is to be subclassed, it’s 
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no advantage at all. The improved model in KeyedCollection is almost certainly 
due to the fact that it was written some years later, with the benefit of hindsight. 
DictionaryBase is best considered useful for backward compatibility. 


ReadOnlyCollection<T> 


ReadOnlyCollection<T> is a wrapper, or proxy, that provides a read-only view of a 
collection. This is useful in allowing a class to publicly expose read-only access to a 
collection that the class can still update internally. 


A read-only collection accepts the input collection in its constructor, to which it 
maintains a permanent reference. It doesn’t take a static copy of the input collection, 
so subsequent changes to the input collection are visible through the read-only 
wrapper. 


To illustrate, suppose that your class wants to provide read-only public access to a 
list of strings called Names. We could do this as follows: 


public class Test 


{ 
List<string> names = new List<string>(); 
public IReadOnlyList<string> Names => names; 


} 


Although Names returns a read-only interface, the consumer can still downcast at 
runtime to List<string> or IList<string> and then call Add, Remove, or Clear on 
the list. ReadOnlyCollection<T> provides a more robust solution: 


public class Test 


{ 
List<string> names = new List<string>(); 
public ReadOnlyCollection<string> Names { get; private set; } 


public Test() => Names = new ReadOnlyCollection<string> (names); 


public void AddInternally() => names.Add ("test"); 
} 


Now, only members within the Test class can alter the list of names: 


Test t = new Test(); 


Console.WriteLine (t.Names.Count); // 9 
t.AddInternally(); 

Console.WriteLine (t.Names.Count); // 1 

t.Names.Add ("test"); // Compiler error 


((IList<string>) t.Names).Add ("test"); // NotSupportedException 
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Immutable Collections 


We just described how ReadOnlyCollection<T> creates a read-only view of a collec- 
tion. Restricting the ability to write (mutate) a collection—or any other object— 
simplifies software and reduces bugs. 


The immutable collections extend this principle, by providing collections that cannot 
be modified at all after initialization. Should you need to add an item to an immuta- 
ble collection, you must instantiate a new collection, leaving the old one untouched. 


Immutability is a hallmark of functional programming and has the following 
benefits: 


e It eliminates a large class of bugs associated with changing state. 


¢ It vastly simplifies parallelism and multithreading, by avoiding most of the 
thread-safety problems that we describe in Chapters 14, 22, and 23. 


e It makes code easier to reason about. 


The disadvantage of immutability is that when you need to make a change, you 
must create a whole new object. This incurs a performance hit, although there are 
mitigating strategies that we discuss in this section, including the ability to reuse 
portions of the original structure. 


The immutable collections are built into INET Core (in .NET Framework, they are 
available via the System.Collections.Immutable NuGet package). All collections are 
defined in the System.Collections. Immutable namespace: 


Type Internal structure 


ImmutableArray<T> Array 

ImmutableList<T> AVL tree 
ImmutabLeDictionary<K,V> AVL tree 
ImmutableHashSet<T> AVL tree 


ImmutableSortedDictionary<K,V> AVL tree 


ImmutableSortedSet<T> AVL tree 
ImmutableStack<T> Linked list 
ImmutableQueue<T> Linked list 





The ImmutableArray<T> and ImmutableList<T> types are both immutable versions 
of List<T>. Both do the same job but with different performance characteristics 
that we discuss in “Immutable Collections and Performance” on page 359. 


The immutable collections expose a public interface similar to their mutable coun- 
terparts. The key difference is that the methods that appear to alter the collection 
(such as Add or Remove) dont alter the original collection; instead they return a new 
collection with the requested item added or removed. 
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Immutable collections prevent the adding and removing of 
items; they don’t prevent the items themselves from being 
mutated. To get the full benefits of immutability, you need to 
ensure that only immutable items end up in an immutable 
collection. 


Creating Immutable Collections 


Each immutable collection type offers a Create<T>() method, which accepts 
optional initial values and returns an initialized immutable collection: 


ImmutableArray<int> array = ImmutableArray.Create<int> (1, 2, 3); 


Each collection also offers a CreateRange<T> method, which does the same job as 
Create<T>; the difference is that its parameter type is IEnumerable<T> instead of 
params T[]. 


You can also create an immutable collection from an existing IEnumerable<T>, 
using appropriate extension methods (ToImmutableArray, ToImmutableList, To 
ImmutableDictionary, and so on): 


var list = new[] { 1, 2, 3 }.ToImmutableList(); 


Manipulating Immutable Collections 


The Add method returns a new collection containing the existing elements plus the 
new one: 


var oldList = ImmutableList.Create<int> (1, 2, 3); 
ImmutableList<int> newList = oldList.Add (4); 


Console.WriteLine (oldList.Count); // 3 (unaltered) 
Console.WriteLine (newList.Count); //1 4 


The Remove method operates in the same fashion, returning a new collection with 
the item removed. 


Repeatedly adding or removing elements in this manner is inefficient, because a 
new immutable collection is created for each add or remove operation. A better sol- 
ution is to call AddRange (or RemoveRange), which accepts an IEnumerable<T> of 
items, which are all added or removed in one go: 


var anotherList = oldList.AddRange (new[] { 4, 5, 6 }); 


The immutable list and array also define Insert and InsertRange methods to insert 
elements at a particular index, a RemoveAt method to remove at an index, and 
RemoveAll, which removes based on a predicate. 
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Builders 


For more complex initialization needs, each immutable collection class defines a 
builder counterpart. Builders are classes that are functionally equivalent to a muta- 
ble collection, with similar performance characteristics. After the data is initialized, 
calling . ToImmutable() on a builder returns an immutable collection: 


ImmutableArray<int>.Builder builder = ImmutableArray.CreateBuilder<int>(); 
builder.Add (1); 

builder.Add (2); 

builder.Add (3); 

builder.RemoveAt (0); 

ImmutableArray<int> myImmutable = builder.ToImmutable(); 


You also can use builders to batch multiple updates to an existing immutable 
collection: 


var builder2 = myImmutable.ToBuilder(); 
builder2.Add (4); // Efficient 
builder2.Remove (2); // Efficient 
// More changes to builder. 
i/ Return a new immutable collection with all the ehahes applied: 
ImmutableArray<int> myImmutable2 = builder2.ToImmutable(); 


Immutable Collections and Performance 


Most of the immutable collections use an AVL tree internally, which allows the add/ 
remove operations to reuse portions of the original internal structure rather than 
having to re-create the entire thing from scratch. This reduces the overhead of add/ 
remove operations from potentially huge (with large collections) to just moderately 
large, but it comes at the cost of making read operations slower. The end result is 
that most immutable collections are slower than their mutable counterparts for both 
reading and writing. 


The most seriously affected is ImmutableList<T>, which for both read and add 
operations is 10 to 200 times slower than List<T> (depending on the size of the 
list). This is why ImmutableArray<T> exists: by using an array inside, it avoids the 
overhead for read operations (for which it’s comparable in performance to an ordi- 
nary mutable array). The flip side is that it’s much slower than (even) Immutable 
List<T> for add operations because none of the original structure can be reused. 


Hence, ImmutableArray<T> is desirable when you want unimpeded read- 
performance and don't expect many subsequent calls to Add or Remove (without 
using a builder): 


Type Read performance Add performance 


ImmutableList<T> Slow Slow 


ImmutableArray<T> Very fast Very slow 
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Calling Remove on an ImmutableArray is more expensive than 
calling Remove on a List<T>—even in the worst-case scenario 
of removing the first element—because allocating the new col- 
lection places additional load on the garbage collector. 


Although the immutable collections as a whole incur a potentially significant per- 
formance cost, it’s important to keep the overall magnitude in perspective. An Add 
operation on an ImmutableList with a million elements is still likely to occur in less 
than a microsecond on a typical laptop, and a read operation, in less than 100 nano- 
seconds. And, if you need to perform write operations in a loop, you can avoid the 
accumulated cost with a builder. 


The following factors also work to mitigate the costs: 


¢ Immutability allows for easy concurrency and parallelization (Chapter 23), so 
you can employ all available cores. Parallelizing with mutable state easily leads 
to errors, and requires the use of locks or concurrent collections, both of which 
hurt performance. 


¢ With immutability, you don't need to “defensively copy” collections or data 
structures to guard against unexpected change. This was a factor in favoring 
the use of immutable collections in writing recent portions of Visual Studio. 


¢ In most typical programs, few collections have enough items for the difference 
to matter. 


In addition to Visual Studio, the well-performing Microsoft Roslyn toolchain was 
built with immutable collections, demonstrating how the benefits can outweigh the 
costs. 


Plugging in Equality and Order 


In the sections “Equality Comparison” on page 296 and “Order Comparison” on 
page 306 in Chapter 6, we described the standard .NET protocols that make a type 
equatable, hashable, and comparable. A type that implements these protocols can 
function correctly in a dictionary or sorted list “out of the box.” More specifically: 


e A type for which Equals and GetHashCode return meaningful results can be 
used as a key in a Dictionary or Hashtable. 


¢ A type that implements IComparable/IComparable<T> can be used as a key in 
any of the sorted dictionaries or lists. 


A type’s default equating or comparison implementation typically reflects what is 
most “natural” for that type. Sometimes, however, the default behavior is not what 
you want. You might need a dictionary whose string type key is treated without 
respect to case. Or you might want a sorted list of customers, sorted by each cus- 
tomer’s postcode. For this reason, the .NET Framework also defines a matching set 
of “plug-in” protocols. The plug-in protocols achieve two things: 
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¢ They allow you to switch in alternative equating or comparison behavior. 


¢ They allow you to use a dictionary or sorted collection with a key type that’s 
not intrinsically equatable or comparable. 


The plug-in protocols consist of the following interfaces: 


ITEqualityComparer and IEqualityComparer<T> 
¢ Performs plug-in equality comparison and hashing 


e Recognized by Hashtable and Dictionary 


IComparer and IComparer<T> 
¢ Performs plug-in order comparison 


¢ Recognized by the sorted dictionaries and collections; also, Array.Sort 


Each interface comes in both generic and nongeneric forms. The IEquality 
Comparer interfaces also have a default implementation in a class called Equality 
Comparer. 


In addition, in Framework 4.0 we got two new interfaces called IStructural 
Equatable and IStructuralComparable which allow for the option of structural 
comparisons on classes and arrays. 


lEqualityComparer and EqualityComparer 


An equality comparer switches in nondefault equality and hashing behavior, pri- 
marily for the Dictionary and Hashtable classes. 


Recall the requirements of a hashtable-based dictionary. It needs answers to two 
questions for any given key: 


e Is it the same as another? 


¢ What is its integer hashcode? 


An equality comparer answers these questions by implementing the IEquality 
Comparer interfaces: 


public interface IEqualityComparer<T> 
{ 

bool Equals (T x, T y); 

int GetHashCode (T obj); 
} 


public interface IEqualityComparer // Nongeneric version 
{ 

bool Equals (object x, object y); 

int GetHashCode (object obj); 








Plugging in Equality and Order | 361 


fa) 
et 
oO 
is] 
= 
) 
s 
7.) 


To write a custom comparer, you implement one or both of these interfaces (imple- 
menting both gives maximum interoperability). Because this is somewhat tedious, 
an alternative is to subclass the abstract EqualityComparer class, defined as follows: 


public abstract class EqualityComparer<T> : IEqualityComparer, 
TEqualityComparer<T> 


{ 
public abstract bool Equals (T x, T y); 


public abstract int GetHashCode (T obj); 


bool IEqualityComparer.Equals (object x, object y); 
int IEqualityComparer.GetHashCode (object obj); 


public static EqualityComparer<T> Default { get; } 
} 


EqualityComparer implements both interfaces; your job is simply to override the 
two abstract methods. 


The semantics for Equals and GetHashCode follow the same rules as those for 
object.Equals and object.GetHashCode, described in Chapter 6. In the following 
example, we define a Customer class with two fields, and then write an equality 
comparer that matches both the first and last names: 


public class Customer 


{ 


public string LastName; 
public string FirstName; 


public Customer (string last, string first) 


{ 


LastName = last; 
FirstName = first; 


} 
} 


public class LastFirstEqComparer : EqualityComparer <Customer> 


{ 


public override bool Equals (Customer x, Customer y) 
=> x.LastName == y.LastName && x.FirstName == y.FirstName; 


public override int GetHashCode (Customer obj) 
=> (obj.LastName + ";" + obj.FirstName).GetHashCode(); 


} 


To illustrate how this works, let’s create two customers: 


Customer c1 = new Customer ("Bloggs", "Joe"); 
Customer c2 = new Customer ("Bloggs", "Joe"); 


Because weve not overridden object.Equals, normal reference-type equality 
semantics apply: 


Console.WriteLine (c1 == c2); // False 
Console.WriteLine (c1.Equals (c2)); // False 
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The same default equality semantics apply when using these customers in a 
Dictionary without specifying an equality comparer: 


var d = new Dictionary<Customer, string>(); 
d [c1] = "Joe"; 
Console.WriteLine (d.ContainsKey (c2)); // False 


Now, with the custom equality comparer: 


var eqComparer = new LastFirstEqComparer(); 

var d = new Dictionary<Customer, string> (eqComparer); 

d [c1] = "Joe"; 

Console.WriteLine (d.ContainsKey (c2)); // True 
In this example, we would have to be careful not to change the customer's 
FirstName or LastName while it was in use in the dictionary; otherwise, its hashcode 
would change and the Dictionary would break. 


EqualityComparer<T>.Default 


Calling EqualityComparer<T>.Default returns a general-purpose equality com- 
parer that you can use as an alternative to the static object.Equals method. The 
advantage is that it first checks whether T implements IEquatable<T>, and if so, 
calls that implementation instead, avoiding the boxing overhead. This is particularly 
useful in generic methods: 


static bool Foo<T> (T x, T y) 


{ 
bool same = EqualityComparer<T>.Default.Equals (x, y); 


IComparer and Comparer 


Comparers are used to switch in custom ordering logic for sorted dictionaries and 
collections. 


Note that a comparer is useless to the unsorted dictionaries such as Dictionary and 
Hashtable—these require an IEqualityComparer to get hashcodes. Similarly, an 
equality comparer is useless for sorted dictionaries and collections. 


Here are the IComparer interface definitions: 
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public interface IComparer 


{ 


int Compare(object x, object y); 


} 


public interface IComparer <in T> 


{ 
int Compare(T x, T y); 


} 


As with equality comparers, there's an abstract class that you can subtype instead of 
implementing the interfaces: 
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public abstract class Comparer<T> : IComparer, IComparer<T> 


{ 
public static Comparer<T> Default { get; } 


public abstract int Compare (T x, T y); // Implemented by you 
int IComparer.Compare (object x, object y); // Implemented for you 


} 


The following example illustrates a class that describes a wish as well as a comparer 
that sorts wishes by priority: 


class Wish 


{ 
public string Name; 
public int Priority; 


public Wish (string name, int priority) 


{ 


Name = name; 
Priority = priority; 
} 
} 


class PriorityComparer : Comparer <Wish> 


{ 


public override int Compare (Wish x, Wish y) 


{ 
if (object.Equals (x, y)) return 0; // Fail-safe check 


return x.Priority.CompareTo (y.Priority); 


} 
} 


The object. Equals check ensures that we can never contradict the Equals method. 


Calling the static object.Equals method in this case is better than calling x.Equals 
because it still works if x is null! 


Here's how our PriorityComparer is used to sort a List: 


var wishList = new List<Wish>(); 

wishList.Add (new Wish ("Peace", 2)); 
wishList.Add (new Wish ("Wealth", 3)); 
wishList.Add (new Wish ("Love", 2)); 
wishList.Add (new Wish ("3 more wishes", 1)); 


wishList.Sort (new PriorityComparer()); 
foreach (Wish w in wishList) Console.Write (w.Name + " | "); 


// OUTPUT: 3 more wishes | Love | Peace | Wealth 


In the next example, SurnameComparer allows you to sort surname strings in an 
order suitable for a phone book listing: 


class SurnameComparer : Comparer <string> 


{ 


string Normalize (string s) 


{ 
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s = s.Trim().ToUpper(); 
if (s.StartsWith ("MC")) s = "MAC" + s.Substring (2); 
return s; 


} 


public override int Compare (string x, string y) 
=> Normalize (x).CompareTo (Normalize (y)); 


} 


Here's SurnameComparer in use in a sorted dictionary: 


var dic = new SortedDictionary<string,string> (new SurnameComparer()); 
dic.Add ("MacPhail", "second!"); 

dic.Add ("MacWilliam", "third!"); 

dic.Add ("McDonald", "first!"); 


foreach (string s in dic.Values) 
Console.Write (s +" "); // first! second! third! 


StringComparer 


StringComparer is a predefined plug-in class for equating and comparing strings, 
allowing you to specify language and case sensitivity. StringComparer implements 
both IEqualityComparer and IComparer (and their generic versions), so you can 
use it with any type of dictionary or sorted collection. 


Because StringComparer is abstract, you obtain instances via its static properties. 
StringComparer.Ordinal mirrors the default behavior for string equality compari- 
son and StringComparer.CurrentCulture for order comparison. Here are all of its 
static members: 


public static StringComparer CurrentCulture { get; } 
public static StringComparer CurrentCultureIgnoreCase { get; } 
public static StringComparer InvariantCulture { get; } 
public static StringComparer InvariantCultureIgnoreCase { get; } 
public static StringComparer Ordinal { get; } 
public static StringComparer OrdinalIgnoreCase { get; } 
public static StringComparer Create (CultureInfo culture, 
bool ignoreCase); 


In the following example, an ordinal case-insensitive dictionary is created such that 
dict["Joe"] and dict["JOE"] mean the same thing: 
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var dict = new Dictionary<string, int> (StringComparer .OrdinalIgnoreCase) ; 
In the next example, an array of names is sorted, using Australian English: 


string[] names = { "Tom", "HARRY", "sheila" }; 
CultureInfo ci = new CultureInfo ("en-AU"); 
Array.Sort<string> (names, StringComparer.Create (ci, false)); 


The final example is a culture-aware version of the SurnameComparer we wrote in 
the previous section (to compare names suitable for a phone book listing): 


class SurnameComparer : Comparer<string> 


{ 





Plugging in Equalityand Order | 365 


StringComparer strCmp; 


public SurnameComparer (CultureInfo ci) 


{ 


// Create a case-sensitive, culture-sensitive string comparer 
strCmp = StringComparer.Create (ci, false); 


} 


string Normalize (string s) 

{ 
S$ = S, Trin); 
if (s.ToUpper().StartsWith ("MC")) s = "MAC" + s.Substring (2); 
return s; 


} 


public override int Compare (string x, string y) 


{ 


// Directly call Compare on our culture-aware StringComparer 
return strCmp.Compare (Normalize (x), Normalize (y)); 


i 
} 


IStructuralEquatable and IStructuralComparable 


As we discussed in Chapter 6, structs implement structural comparison by default: 
two structs are equal if all of their fields are equal. Sometimes, however, structural 
equality and order comparison are useful as plug-in options on other types, as 
well—such as arrays. Framework 4.0 introduced two new interfaces to help with 
this: 


public interface IStructuralEquatable 
{ 


bool Equals (object other, IEqualityComparer comparer); 
int GetHashCode (IEqualityComparer comparer); 


} 


public interface IStructuralComparable 


{ 


int CompareTo (object other, IComparer comparer); 


} 


The IEqualityComparer/IComparer that you pass in are applied to each individual 
element in the composite object. We can demonstrate this by using arrays. In the 
following example, we compare two arrays for equality, first using the default 
Equals method, then using IStructuralEquatable’s version: 


int[] at = { 1, 2, 3 }; 

intl] a2 { 12, 2,3 33 

IStructuralEquatable sel = a1; 

Console.Write (a1.Equals (a2)); // False 
Console.Write (se1.Equals (a2, EqualityComparer<int>.Default)); // True 
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Here’s another example: 


string[] a1 = "the quick brown fox".Split(); 

string[] a2 = "THE QUICK BROWN FOX".Split(); 

IStructuralEquatable sel = a1; 

bool isTrue = se1.Equals (a2, StringComparer.InvariantCultureIgnoreCase) ; 


ie) 
oO 
oO 
a 
=A 
fe) 
3 
wn 








Plugging in Equalityand Order | 367 








LINQ Queries 


LINQ, or Language-Integrated Query, is a set of language and framework features 
for writing structured type-safe queries over local object collections and remote data 
sources. 


LINQ enables you to query any collection implementing IEnumerable<T>, whether 
an array, list, or XML DOM, as well as remote data sources, such as tables in a SQL 
Server database. LINQ offers the benefits of both compile-time type checking and 
dynamic query composition. 


This chapter describes the LINQ architecture and the fundamentals of writing quer- 
ies. All core types are defined in the System.Ling and System.Ling. Expressions 
namespaces. 


The examples in this and the following two chapters are pre- 
loaded into an interactive querying tool called LINQPad. You 
can download LINQPad from wwwlingpad.net. 


Getting Started 


The basic units of data in LINQ are sequences and elements. A sequence is any object 
that implements IEnumerable<T>, and an element is each item in the sequence. In 
the following example, names is a sequence, and "Tom", "Dick", and "Harry" are 
elements: 


string[] names = { "Tom", "Dick", "Harry" }; 


We call this a local sequence because it represents a local collection of objects in 
memory. 


A query operator is a method that transforms a sequence. A typical query operator 
accepts an input sequence and emits a transformed output sequence. In the 
Enumerable class in System.Ling, there are around 40 query operators—all 
implemented as static extension methods. These are called standard query operators. 
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Queries that operate over local sequences are called local quer- 
ies or LINQ-to-objects queries. 


LINQ also supports sequences that can be dynamically fed 
from a remote data source such as a SQL Server database. 
These sequences additionally implement the IQueryable<T> 
interface and are supported through a matching set of stan- 
dard query operators in the Queryable class. We discuss this 
further in “Interpreted Queries” on page 398. 


A query is an expression that, when enumerated, transforms sequences with query 
operators. The simplest query comprises one input sequence and one operator. For 
instance, we can apply the Where operator on a simple array to extract those strings 
whose length is at least four characters, as follows: 


string[] names = { "Tom", "Dick", "Harry" }; 
IEnumerable<string> filteredNames = System.Linq.Enumerable.Where 
(names, n => n.Length >= 4); 
foreach (string n in filteredNames) 
Console.WriteLine (n); 


OUTPUT: 
Dick 
Harry 


Because the standard query operators are implemented as extension methods, we 
can call Where directly on names, as though it were an instance method: 


TEnumerable<string> filteredNames = names.Where (n => n.Length >= 4); 


For this to compile, you must import the System.Linq namespace. Here’s a com- 
plete example: 

using System; 

using System.Collections.Generic; 


using System.Lingq; 


class LinqDemo 


{ 
static void Main() 
{ 
string[] names = { "Tom", "Dick", "Harry" }; 
IEnumerable<string> filteredNames = names.Where (n => n.Length >= 4); 
foreach (string name in filteredNames) Console.WriteLine (name); 
} 
} 
OUTPUT: 
Dick 
Harry 
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We could further shorten our code by implicitly typing fil 
teredNames: 

var filteredNames = names.Where (n => n.Length >= 4); 
This can hinder readability, however, outside of an IDE, where 
there are no tool tips to help. For this reason, we make less use 
of implicit typing in this chapter than you might in your own 
projects. 


Most query operators accept a lambda expression as an argument. The lambda 
expression helps guide and shape the query. In our example, the lambda expression 
is as follows: 


n => n.Length >= 4 


The input argument corresponds to an input element. In this case, the input argu- 
ment n represents each name in the array and is of type string. The Where operator 
requires that the lambda expression return a bool value, which, if true, indicates 
that the element should be included in the output sequence. Here's its signature: 


public static IEnumerable<TSource> Where<TSource> 
(this IEnumerable<TSource> source, Func<TSource,bool> predicate) 


«<>, 


The following query extracts all names that contain the letter “a”: 


IEnumerable<string> filteredNames = names.Where (n => n.Contains ("a")); 


foreach (string name in filteredNames) 
Console.WriteLine (name); // Harry 


So far, we've built queries using extension methods and lambda expressions. As 
you'll see shortly, this strategy is highly composable in that it allows the chaining of 
query operators. In this book, we refer to this as fluent syntax.' C# also provides 
another syntax for writing queries, called query expression syntax. Here's our preced- 
ing query written as a query expression: 


TEnumerable<string> filteredNames = from n in names 
where n.Contains ("a") 
select n; 


Fluent syntax and query syntax are complementary. In the following two sections, 
we explore each in more detail. 


Fluent Syntax 


Fluent syntax is the most flexible and fundamental. In this section, we describe how 
to chain query operators to form more complex queries—and show why extension 
methods are important to this process. We also describe how to formulate lambda 
expressions for a query operator and introduce several new query operators. 





1 The term is based on Eric Evans and Martin Fowler’s work on fluent interfaces. 
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Chaining Query Operators 


In the preceding section, we showed two simple queries, each comprising a single 
query operator. To build more complex queries, you append additional query oper- 
ators to the expression, creating a chain. To illustrate, the following query extracts 


all strings containing the letter “a, sorts them by length, and then converts the 
results to uppercase: 

using System; 

using System.Collections.Generic; 


using System.Ling; 


class LinqDemo 


{ 
static void Main() 
{ 
string[] names = { "Tom", "Dick", "Harry", "Mary", "Jay" }; 
IEnumerable<string> query = names 
-Where (n => n.Contains ("a")) 
.OrderBy (n => n.Length) 
.Select (n => n.ToUpper()); 
foreach (string name in query) Console.WriteLine (name); 
} 
} 
OUTPUT: 
JAY 
MARY 
HARRY 


The variable n in our example is privately scoped to each of 
the lambda expressions. We can reuse the identifier n for the 
same reason that we can reuse the identifier c in the following 
method: 


void Test() 
{ 


foreach (char c in "stringi") Console.Write (c); 
foreach (char c in "string2") Console.Write (c); 
foreach (char c in "string3") Console.Write (c); 


} 


Where, OrderBy, and Select are standard query operators that resolve to extension 
methods in the Enumerable class (if you import the System. Linq namespace). 


We already introduced the Where operator, which emits a filtered version of the 
input sequence. The OrderBy operator emits a sorted version of its input sequence; 
the Select method emits a sequence in which each input element is transformed or 
projected with a given lambda expression (n.ToUpper(), in this case). Data flows 
from left to right through the chain of operators, so the data is first filtered, then 
sorted, and then projected. 
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A query operator never alters the input sequence; instead, it 
returns a new sequence. This is consistent with the functional 
programming paradigm that inspired LINQ. 


Here are the signatures of each of these extension methods (with the OrderBy signa- 
ture slightly simplified): 


public static IEnumerable<TSource> Where<TSource> 
(this IEnumerable<TSource> source, Func<TSource,bool> predicate) 


public static IEnumerable<TSource> OrderBy<TSource, TKey> 
(this IEnumerable<TSource> source, Func<TSource,TKey> keySelector) 


public static IEnumerable<TResult> Select<TSource, TResult> 
(this IEnumerable<TSource> source, Func<TSource,TResult> selector) 


When query operators are chained as in this example, the output sequence of one 
operator is the input sequence of the next. The complete query resembles a produc- 
tion line of conveyor belts, as illustrated in Figure 8-1. 





nh => n => n> 
n.Contains ("a") n.Length n.ToUpper() 
53 4 - 
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Figure 8-1. Chaining query operators 


We can construct the identical query progressively, as follows: 


// You must import the System.Ling namespace for this to compile: 


IEnumerable<string> filtered = names «Where (n => n.Contains ("a")); 
IEnumerable<string> sorted = filtered.OrderBy (n => n.Length); 
TEnumerable<string> finalQuery = sorted .Select (n => n.ToUpper()); 


finalQuery is compositionally identical to the query we constructed previously. 
Further, each intermediate step also comprises a valid query that we can execute: 


foreach (string name in filtered) 
Console.Write (name + "|"); // Harry|Mary|Jay| 


Console.WriteLine(); 
foreach (string name in sorted) 
Console.Write (name + "|"); // Jay|Mary|Harry| 


- 
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n 





Console.WriteLine(); 
foreach (string name in finalQuery) 
Console.Write (name + "|"); // JAY |MARY | HARRY | 
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Why extension methods are important 


Instead of using extension method syntax, you can use conventional static method 
syntax to call the query operators: 


IEnumerable<string> filtered = Enumerable.Where (names, 
n => n.Contains ("a")); 
IEnumerable<string> sorted = Enumerable.OrderBy (filtered, n => n.Length); 
IEnumerable<string> finalQuery = Enumerable.Select (sorted, 
n => n.ToUpper()); 


This is, in fact, how the compiler translates extension method calls. Shunning exten- 
sion methods comes at a cost, however, if you want to write a query in a single state- 
ment as we did earlier. Let’s revisit the single-statement query—first in extension 
method syntax: 


IEnumerable<string> query = names.Where (n => n.Contains ("a")) 
.OrderBy (n => n.Length) 
.Select (n => n.ToUpper()); 


Its natural linear shape reflects the left-to-right flow of data and keeps lambda 
expressions alongside their query operators (infix notation). Without extension 
methods, the query loses its fluency: 


TEnumerable<string> query = 
Enumerable.Select ( 
Enumerable.OrderBy ( 
Enumerable.Where ( 
names, n => n.Contains ("a") 
), n => n.Length 
), n => n.ToUpper() 
)3 


Composing Lambda Expressions 


In previous examples, we fed the following lambda expression to the Where 
operator: 


n => n.Contains ("a") // Input type = string, return type = bool. 


A lambda expression that takes a value and returns a bool is 
called a predicate. 


The purpose of the lambda expression depends on the particular query operator. 
With the Where operator, it indicates whether an element should be included in the 
output sequence. In the case of the OrderBy operator, the lambda expression maps 
each element in the input sequence to its sorting key. With the Select operator, the 
lambda expression determines how each element in the input sequence is trans- 
formed before being fed to the output sequence. 
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A lambda expression in a query operator always works on 
individual elements in the input sequence—not the sequence 
as a whole. 


The query operator evaluates your lambda expression upon demand, typically once 
per element in the input sequence. Lambda expressions allow you to feed your own 
logic into the query operators. This makes the query operators versatile, and simple 
under the hood. Here’s a complete implementation of Enumerable.where, exception 
handling aside: 


public static IEnumerable<TSource> Where<TSource> 
(this IEnumerable<TSource> source, Func<TSource,bool> predicate) 


{ 
foreach (TSource element in source) 
if (predicate (element) ) 
yield return element; 


Lambda expressions and Func signatures 


The standard query operators utilize generic Func delegates. Func is a family of 
general-purpose generic delegates in the System namespace, defined with the fol- 
lowing intent: 


The type arguments in Func appear in the same order they do in lambda 
expressions. 


Hence, Func<TSource,bool> matches a TSource=>bool lambda expression: one that 
accepts a TSource argument and returns a bool value. 


Similarly, Func<TSource,TResult> matches a TSource=>TResult lambda 
expression. 


The Func delegates are listed in “Lambda Expressions” on page 165 in Chapter 4. 


Lambda expressions and element typing 


The standard query operators use the following type parameter names: 


Generic type letter Meaning 


TSource Element type for the input sequence 
TResult Element type for the output sequence (if different from TSource) 
TKey Element type for the key used in sorting, grouping, or joining 





TSource is determined by the input sequence. TResult and TKey are typically infer- 
red from your lambda expression. 


For example, consider the signature of the Select query operator: 


public static IEnumerable<TResult> Select<TSource, TResult> 
(this IEnumerable<TSource> source, Func<TSource,TResult> selector) 
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Func<TSource, TResuLt> matches a TSource=>TResuLt lambda expression: one that 
maps an input element to an output element. TSource and TResult can be different 
types, so the lambda expression can change the type of each element. Further, the 
lambda expression determines the output sequence type. The following query uses 
Select to transform string-type elements to integer-type elements: 


string[] names = { "Tom", "Dick", "Harry", "Mary", "Jay" }; 
TEnumerable<int> query = names.Select (n => n.Length); 


foreach (int length in query) 
Console.Write (length + "|"); // 3\4|51413] 


The compiler can infer the type of TResult from the return value of the lambda 
expression. In this case, n. Length returns an int value, so TResult is inferred to be 
int. 


The Where query operator is simpler and requires no type inference for the output 
because input and output elements are of the same type. This makes sense because 
the operator merely filters elements; it does not transform them: 


public static IEnumerable<TSource> Where<TSource> 
(this IEnumerable<TSource> source, Func<TSource,bool> predicate) 


Finally, consider the signature of the OrderBy operator: 


// Slightly simplified: 
public static IEnumerable<TSource> OrderBy<TSource, TKey> 
(this IEnumerable<TSource> source, Func<TSource,TKey> keySelector) 


Func<TSource, TKey> maps an input element to a sorting key. TKey is inferred from 
your lambda expression and is separate from the input and output element types. 
For instance, we could choose to sort a list of names by length (int key) or alpha- 
betically (string key): 


string[] names = { "Tom", "Dick", "Harry", "Mary", "Jay" }; 
TEnumerable<string> sortedByLength, sortedAlphabetically; 
sortedByLength = names.OrderBy (n => n.Length); // int key 
sortedAlphabetically = names.OrderBy (n => n); // string key 


You can call the query operators in Enumerable with tradi- 
tional delegates that refer to methods instead of lambda 
expressions. This approach is effective in simplifying certain 
kinds of local queries—particularly with LINQ to XML—and 
is demonstrated in Chapter 10. It doesn't work with 
IQueryable<T>-based sequences, however (e.g., when query- 
ing a database), because the operators in Queryable require 
lambda expressions in order to emit expression trees. We dis- 
cuss this later in “Interpreted Queries” on page 398. 
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Natural Ordering 


The original ordering of elements within an input sequence is significant in LINQ. 
Some query operators rely on this ordering, such as Take, Skip, and Reverse. 


The Take operator outputs the first x elements, discarding the rest: 


int[] numbers = { 10, 9, 8, 7, 6 }; 
TEnumerable<int> firstThree = numbers.Take (3); 


// { 10, 9, 8 } 
The Skip operator ignores the first x elements and outputs the rest: 
TEnumerable<int> LastTwo = numbers.Skip (3); [fh {73-6} 
Reverse does exactly as it says: 
IEnumerable<int> reversed = numbers.Reverse(); // (6, 7, 8, 9, 10 } 


With local queries (LINQ-to-objects), operators such as Where and Select preserve 
the original ordering of the input sequence (as do all other query operators, except 
for those that specifically change the ordering). 


Other Operators 


Not all query operators return a sequence. The element operators extract one ele- 
ment from the input sequence; examples are First, Last, and Elementat: 


int[] numbers = { 10, 9, 8, 7, 6 }3 

int firstNumber = numbers.First(); // 10 
int lastNumber = numbers.Last(); // 6 
int secondNumber = numbers.ELlementAt(1); // 9 
int secondLowest = numbers.OrderBy(n=>n).Skip(1).First(); // 7 


Because these operators return a single element, you don’t usually call further query 
operators on their result unless that element itself is a collection. 


The aggregation operators return a scalar value, usually of numeric type: 


int count = numbers.Count(); // 53 
int min = numbers.Min(); // 63 


The quantifiers return a bool value: 


bool hasTheNumberNine = numbers.Contains (9); // true 
bool hasMoreThanZeroElements = numbers.Any(); // true 
bool hasAnOddElement = numbers.Any (n => n% 2 != 0); // true 


Some query operators accept two input sequences. Examples are Concat, which 
appends one sequence to another, and Union, which does the same but with dupli- 
cates removed: 


int[] seqi = { 1, 2, 3 }; 
int[] seq2 = { 3, 4, 5 }; 
TEnumerable<int> concat = seq1.Concat (seq2); // 
TIEnumerable<int> union = seq1.Union (seq2); // 
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The joining operators also fall into this category. Chapter 9 covers all of the query 
operators in detail. 


Query Expressions 


C# provides a syntactic shortcut for writing LINQ queries, called query expressions. 
Contrary to popular belief, a query expression is not a means of embedding SQL 
into C#. In fact, the design of query expressions was inspired primarily by list com- 
prehensions from functional programming languages such as LISP and Haskell, 
although SQL had a cosmetic influence. 


In this book, we refer to query expression syntax simply as 
query syntax. 


In the preceding section, we wrote a fluent-syntax query to extract strings contain- 


ing the letter “a”, sorted by length and converted to uppercase. Here’s the same thing 
in query syntax: 

using System; 

using System.Collections.Generic; 

using System.Lingq; 


class LinqDemo 


{ 
static void Main() 
{ 
string[] names = { "Tom", "Dick", "Harry", "Mary", "Jay" }; 
IEnumerable<string> query = 
from n in names 
where n.Contains ("a") // Filter elements 
orderby n.Length // Sort elements 
select n.ToUpper(); // Translate each element (project) 
foreach (string name in query) Console.WriteLine (name); 
} 
a 
OUTPUT: 
JAY 
MARY 
HARRY 


Query expressions always start with a from clause and end with either a select ora 
group clause. The from clause declares a range variable (in this case, n), which you 
can think of as traversing the input sequence—rather like foreach. Figure 8-2 illus- 
trates the complete syntax as a railroad diagram. 
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To read this diagram, start at the left and then proceed along 
the track as if you were a train. For instance, after the manda- 
tory from clause, you can optionally include an orderby 
where, let, or join clause. After that, you can either continue 
with a select or group clause, or go back and include another 
from, orderby, where, let, or join clause. 
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Figure 8-2. Query syntax 
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The compiler processes a query expression by translating it into fluent syntax. It 
does this in a fairly mechanical fashion—much like it translates foreach statements 
into calls to GetEnumerator and MoveNext. This means that anything you can write 
in query syntax you can also write in fluent syntax. The compiler (initially) trans- 
lates our example query into the following: 

TEnumerable<string> query = names.Where (n => n.Contains ("a")) 


-OrderBy (n => n.Length) 
.Select (n => n.ToUpper()); 





Query Expressions | 379 


(9) 
ra 
o 
— 
o 
wn 


ONI1 





The Where, OrderBy, and Select operators then resolve using the same rules that 
would apply if the query were written in fluent syntax. In this case, they bind to 
extension methods in the Enumerable class because the System.Linq namespace is 
imported and names implements IEnumerable<string>. The compiler doesn't 
specifically favor the Enumerable class, however, when translating query expres- 
sions. You can think of the compiler as mechanically injecting the words Where, 
OrderBy, and Select into the statement and then compiling it as though you had 
typed the method names yourself. This offers flexibility in how they resolve. The 
operators in the database queries that we write in later sections, for instance, will 
bind instead to extension methods in Queryable. 


If we remove the using System.Ling directive from our pro- 
gram, the query would not compile, since the Where, OrderBy, 
and Select methods would have nowhere to bind. Query 
expressions cannot compile unless you import System.Ling, 
or another namespace with an implementation of these query 
methods. 


Range Variables 


The identifier immediately following the from keyword syntax is called the range 
variable. A range variable refers to the current element in the sequence on which the 
operation is to be performed. 


In our examples, the range variable n appears in every clause in the query. And yet, 
the variable actually enumerates over a different sequence with each clause: 


from n in names // n is our range variable 

where n.Contains ("a") // n = directly from the array 
orderby n.Length // n = subsequent to being filtered 
select n.ToUpper() // n = subsequent to being sorted 


This becomes clear when we examine the compiler’s mechanical translation to flu- 
ent syntax: 


names.Where (n => n.Contains ("a")) // Locally scoped n 
.OrderBy (n => n.Length) // Locally scoped n 
.Select (n => n.ToUpper()) // Locally scoped n 


As you can see, each instance of n is scoped privately to its own lambda expression. 
Query expressions also let you introduce new range variables via the following 
clauses: 

e let 

e into 

e An additional from clause 


e join 
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We cover these later in this chapter in “Composition Strategies” on page 392, as well 
as in Chapter 9, in “Projecting” on page 423 and “Joining” on page 423. 


Query Syntax Versus SQL Syntax 


Query expressions look superficially like SQL, yet the two are very different. A 
LINQ query boils down to a C# expression, and so follows standard C# rules. For 
example, with LINQ, you cannot use a variable before you declare it. In SQL, you 
can reference a table alias in the SELECT clause before defining it in a FROM clause. 


A subquery in LINQ is just another C# expression and so requires no special syntax. 
Subqueries in SQL are subject to special rules. 


With LINQ, data logically flows from left to right through the query. With SQL, the 
order is less well structured with regard to data flow. 


A LINQ query comprises a conveyor belt or pipeline of operators that accept and 
emit sequences whose element order can matter. A SQL query comprises a network 
of clauses that work mostly with unordered sets. 


Query Syntax Versus Fluent Syntax 


Query and fluent syntax each have advantages. 


Query syntax is simpler for queries that involve any of the following: 


¢ A let clause for introducing a new variable alongside the range variable 


e SelectMany, Join, or GroupJoin, followed by an outer range variable reference 


(We describe the let clause in “Composition Strategies” on page 392; we describe 
SelectMany, Join, and GroupJoin in Chapter 9.) 


The middle ground is queries that involve the simple use of Where, OrderBy, and 
Select. Either syntax works well; the choice here is largely personal. 


For queries that comprise a single operator, fluent syntax is shorter and less 
cluttered. 


Finally, there are many operators that have no keyword in query syntax. These 
require that you use fluent syntax—at least in part. This means any operator outside 
of the following: 


Where, Select, SelectMany 
OrderBy, ThenBy, OrderByDescending, ThenByDescending 
GroupBy, Join, GroupJoin 


Mixed-Syntax Queries 


If a query operator has no query-syntax support, you can mix query syntax and flu- 
ent syntax. The only restriction is that each query-syntax component must be com- 
plete (i.e., start with a from clause and end with a select or group clause). 
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Assuming this array declaration: 


string[] names = { "Tom", "Dick", "Harry", "Mary", "Jay" }; 


«>, 


the following example counts the number of names containing the letter “a”: 


int matches = (from n in names where n.Contains ("a") select n).Count(); 


// 3 


The next query obtains the first name in alphabetical order: 
string first = (from n in names orderby n select n).First(); // Dick 


The mixed-syntax approach is sometimes beneficial in more complex queries. With 
these simple examples, however, we could stick to fluent syntax throughout without 


penalty: 


int matches = names.Where (n => n.Contains ("a")).Count(); iJ 3 
string first = names.OrderBy (n => n).First(); // Dick 


There are times when mixed-syntax queries offer by far the 
highest “bang for the buck” in terms of function and simplic- 
ity. It’s important not to unilaterally favor either query or flu- 
ent syntax; otherwise, you'll be unable to write mixed-syntax 
queries when they are the best option. 


Where applicable, the remainder of this chapter shows key concepts in both fluent 
and query syntax. 


Deferred Execution 


An important feature of most query operators is that they execute not when con- 
structed, but when enumerated (in other words, when MoveNext is called on its enu- 
merator). Consider the following query: 


var numbers = new List<int> { 1 }; 
TEnumerable<int> query = numbers.Select (n => n * 10); // Build query 
numbers.Add (2); // Sneak in an extra element 


foreach (int n in query) 
Console.Write (n+ "|"); // 10|20| 


The extra number that we sneaked into the list after constructing the query is 
included in the result because it’s not until the foreach statement runs that any fil- 
tering or sorting takes place. This is called deferred or lazy execution and is the same 
as what happens with delegates: 


Action a = () => Console.WriteLine ("Foo"); 
// We've not written anything to the Console yet. Now let's run it: 
a(); // Deferred execution! 


All standard query operators provide deferred execution, with the following 
exceptions: 
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¢ Operators that return a single element or scalar value, such as First or Count 


¢ The following conversion operators: 
ToArray, ToList, ToDictionary, ToLookup, ToHashSet 


These operators cause immediate query execution because their result types have no 
mechanism to provide deferred execution. The Count method, for instance, returns 
a simple integer, which doesn’t then get enumerated. The following query is exe- 
cuted immediately: 


int matches = numbers.Where (n => n <= 2).Count(); // 1 


Deferred execution is important because it decouples query construction from query 
execution. This allows you to construct a query in several steps, and makes database 
queries possible. 


Subqueries provide another level of indirection. Everything in 
a subquery is subject to deferred execution, including aggre- 
gation and conversion methods. We describe this in “Subquer- 
ies” on page 388. 


Reevaluation 


Deferred execution has another consequence: a deferred execution query is reevalu- 
ated when you reenumerate: 


var numbers = new List<int>() { 1, 2 }; 


TEnumerable<int> query = numbers.Select (n => n * 10); 
foreach (int n in query) Console.Write (n+ "|");  // 10|20| 


numbers.Clear(); 
foreach (int n in query) Console.Write (n + "|");  // <nothing> 


There are a couple of reasons why reevaluation is sometimes disadvantageous: 


e Sometimes, you want to “freeze” or cache the results at a certain point in time. 


e Some queries are computationally intensive (or rely on querying a remote data- 
base), so you don't want to unnecessarily repeat them. 


You can defeat reevaluation by calling a conversion operator such as ToArray or 
ToList. ToArray copies the output of a query to an array; ToList copies to a generic 
List<T>: 


var numbers = new List<int>() { 1, 2 }; 


Pr 
i) Fe 
3 9 
n 


List<int> timesTen = numbers 
Select (n => n * 10) 





-ToList(); // Executes immediately into a List<int> 
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numbers.Clear(); 
Console.WriteLine (timesTen.Count); // Still 2 


Captured Variables 


If your query’s lambda expressions capture outer variables, the query will honor the 
value of those variables at the time the query runs: 


int[] numbers = { 1, 2 }; 


int factor = 10; 

TEnumerable<int> query = numbers.Select (n => n * factor); 
factor = 20; 

foreach (int n in query) Console.Write (n+ "|");  // 20|40| 


This can be a trap when building up a query within a for loop. For example, sup- 
pose that we want to remove all vowels from a string. The following, although inef- 
ficient, gives the correct result: 


TEnumerable<char> query = "Not what you might expect"; 
query = query.Where (c => c != 'a'); 
query = query.Where (c => c != 'e'); 
query = query.Where (c => c != 'i'); 
query = query.Where (c => c != 'o'); 
query = query.Where (c => c != 'u'); 


foreach (char c in query) Console.Write (c); // Nt wht y mght xpct 
Now watch what happens when we refactor this with a for loop: 


TEnumerable<char> query = "Not what you might expect"; 
string vowels = "aeiou"; 


for (int i = 0; i < vowels.Length; i++) 
query = query.Where (c => c != vowels[i]); 


foreach (char c in query) Console.Write (c); 


An IndexOutOfRangeException is thrown upon enumerating the query because, as 
we saw in Chapter 4 (see “Capturing Outer Variables” on page 166), the compiler 
scopes the iteration variable in the for loop as if it were declared outside the loop. 
Hence, each closure captures the same variable (i) whose value is 5 when the query 
is actually enumerated. To solve this, you must assign the loop variable to another 
variable declared inside the statement block: 


for (int i = 0; i < vowels.Length; i++) 
{ 

char vowel = vowels[i]; 

query = query.Where (c => c != vowel); 


This forces a fresh local variable to be captured on each loop iteration. 
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Another way to solve the problem is to replace the for loop 
with a foreach loop: 
foreach (char vowel in vowels) 


query = query.Where (c => c != vowel); 


How Deferred Execution Works 
Query operators provide deferred execution by returning decorator sequences. 


Unlike a traditional collection class such as an array or linked list, a decorator 
sequence (in general) has no backing structure of its own to store elements. Instead, 
it wraps another sequence that you supply at runtime, to which it maintains a per- 
manent dependency. Whenever you request data from a decorator, it in turn must 
request data from the wrapped input sequence. 


The query operator’s transformation constitutes the “decora- 
tion.” If the output sequence performed no transformation, it 
would be a proxy rather than a decorator. 


Calling Where merely constructs the decorator wrapper sequence, which holds a ref- 
erence to the input sequence, the lambda expression, and any other arguments sup- 
plied. The input sequence is enumerated only when the decorator is enumerated. 


Figure 8-3 illustrates the composition of the following query: 


IEnumerable<int> lessThanTen = new int[] { 5, 12, 3 }.Where (n => n < 10); 





Where 
decorator 


= 
n => lessThanTen 
n< 10 


predicate 














Figure 8-3. Decorator sequence 


When you enumerate LessThanTen, you are, in effect, querying the array through 
the Where decorator. 


The good news—should you ever want to write your own query operator—is that 
implementing a decorator sequence is easy with a C# iterator. Here's how you can 
write your own Select method: 


public static IEnumerable<TResult> MySelect<TSource, TResult> 
(this IEnumerable<TSource> source, Func<TSource,TResult> selector) 
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{ 
foreach (TSource element in source) 
yield return selector (element); 


} 


This method is an iterator by virtue of the yield return statement. Functionally, 
it’s a shortcut for the following: 


public static IEnumerable<TResult> MySelect<TSource, TResult> 
(this IEnumerable<TSource> source, Func<TSource,TResult> selector) 


{ 


return new SelectSequence (source, selector); 


I 


where SelectSequence is a (compiler-written) class whose enumerator encapsulates 
the logic in the iterator method. 


Hence, when you call an operator such as Select or Where, youre doing nothing 
more than instantiating an enumerable class that decorates the input sequence. 


Chaining Decorators 


Chaining query operators creates a layering of decorators. Consider the following 
query: 
TEnumerable<int> query = new int[] { 5, 12, 3 }.Where (n => n < 10) 


.OrderBy (n => n) 
Select (n => n * 10); 


Each query operator instantiates a new decorator that wraps the previous sequence 
(rather like a Russian nesting doll). Figure 8-4 illustrates the object model of this 
query. Note that this object model is fully constructed prior to any enumeration. 
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Figure 8-4. Layered decorator sequences 
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When you enumerate query, youre querying the original array, transformed 
through a layering or chain of decorators. 


Adding ToList onto the end of this query would cause the 
preceding operators to execute immediately, collapsing the 
whole object model into a single list. 


Figure 8-5 shows the same object composition in Unified Modeling Language 
(UML) syntax. Select’s decorator references the OrderBy decorator, which 
references Where’s decorator, which references the array. A feature of deferred exe- 
cution is that you build the identical object model if you compose the query 
progressively: 
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Figure 8-5. UML decorator composition 


TEnumerable<int> 
source = new int[] { 5, 12, 3 }, 
filtered = source «Where (n => n < 10), 
sorted = filtered .OrderBy (n => n), 
query = sorted .Select (n => n * 10); 


How Queries Are Executed 
Here are the results of enumerating the preceding query: 


foreach (int n in query) Console.WriteLine (n); 


OUTPUT: 
30 
50 


Behind the scenes, the foreach calls GetEnumerator on Select’s decorator (the last 
or outermost operator), which kicks off everything. The result is a chain of enumer- 
ators that structurally mirrors the chain of decorator sequences. Figure 8-6 illus- 
trates the flow of execution as enumeration proceeds. 
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In the first section of this chapter, we depicted a query as a production line of con- 
veyor belts. Extending this analogy, we can say a LINQ query is a lazy production 
line, where the conveyor belts roll elements only upon demand. Constructing a 
query constructs a production line—with everything in place—but with nothing 
rolling. Then, when the consumer requests an element (enumerates over the query), 
the rightmost conveyor belt activates; this in turn triggers the others to roll—as and 
when input sequence elements are needed. LINQ follows a demand-driven pull 
model, rather than a supply-driven push model. This is important—as you'll see 
later—in allowing LINQ to scale to querying SQL databases. 
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Figure 8-6. Execution of a local query 


Subqueries 


A subquery is a query contained within another query’s lambda expression. The fol- 
lowing example uses a subquery to sort musicians by their last name: 


string[] musos = 
{ "David Gilmour", "Roger Waters", "Rick Wright", "Nick Mason" }; 


TEnumerable<string> query = musos.OrderBy (m => m.Split().Last()); 


m.Split converts each string into a collection of words, upon which we then call the 
Last query operator. m.Split().Last is the subquery; query references the outer 
query. 

Subqueries are permitted because you can put any valid C# expression on the right- 
hand side of a lambda. A subquery is simply another C# expression. This means 
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that the rules for subqueries are a consequence of the rules for lambda expressions 
(and the behavior of query operators in general). 


The term subquery, in the general sense, has a broader mean- 
ing. For the purpose of describing LINQ, we use the term only 
for a query referenced from within the lambda expression of 
another query. In a query expression, a subquery amounts to a 
query referenced from an expression in any clause except the 
from clause. 


A subquery is privately scoped to the enclosing expression and can reference 
parameters in the outer lambda expression (or range variables in a query 
expression). 


m.Split().Last is a very simple subquery. The next query retrieves all strings in an 
array whose length matches that of the shortest string: 


string[] names = { "Tom", "Dick", "Harry", "Mary", "Jay" }; 


IEnumerable<string> outerQuery = names 
.Where (n => n.Length == names.OrderBy (n2 => n2.Length) 
-Select (n2 => n2.Length).First()); 


// Tom, Jay 


Here's the same thing as a query expression: 


TEnumerable<string> outerQuery = 
from nin names 
where n.Length == 
(from n2 in names orderby n2.Length select n2.Length).First() 
select n; 


Because the outer range variable (n) is in scope for a subquery, we cannot reuse n as 
the subquery’s range variable. 


A subquery is executed whenever the enclosing lambda expression is evaluated. 
This means that a subquery is executed upon demand, at the discretion of the outer 
query. You could say that execution proceeds from the outside in. Local queries fol- 
low this model literally; interpreted queries (e.g., database queries) follow this 
model conceptually. 


The subquery executes as and when required, to feed the outer query. As Figure 8-7 
and Figure 8-8 illustrate, the subquery in our example (the top conveyor belt in 
Figure 8-7) executes once for every outer loop iteration. 


We can express our preceding subquery more succinctly as follows: 
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TEnumerable<string> query = 
from nin names 
where n.Length == names.OrderBy (n2 => n2.Length).First().Length 
select n; 








Subqueries | 389 





n2 => n2 => 


n2.Length n2.Length  Subquery 
A a 


Ker 
Ale 
Auey 
pid 
wo] 


.First() 






O .orderby OO select O 





Outer 
n 4 n query 


1 => 
n.Length== 
a 





— 
i=") 
<= 


Ale 
Auey 
pid 
Ker 
wo} 














Figure 8-7. Subquery composition 


With the Min aggregation function, we can simplify the query further: 


TEnumerable<string> query = 
from nin names 
where n.Length == names.Min (n2 => n2.Length) 
select n; 


In “Interpreted Queries” on page 398, we describe how remote sources such as SQL 
tables can be queried. Our example makes an ideal database query because it would 
be processed as a unit, requiring only one round trip to the database server. This 
query, however, is inefficient for a local collection because the subquery is recalcula- 
ted on each outer loop iteration. We can avoid this inefficiency by running the sub- 


query separately (so that it’s no longer a subquery): 


int shortest = names.Min (n => n.Length); 


TEnumerable<string> query = from  n in names 
where n.Length == shortest 
select n; 


Factoring out subqueries in this manner is nearly always 
desirable when querying local collections. An exception is 
when the subquery is correlated, meaning that it references the 
outer range variable. We explore correlated subqueries in 
“Projecting” on page 423 in Chapter 9. 
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Figure 8-8. UML subquery composition 


Subqueries and Deferred Execution 


An element or aggregation operator such as First or Count in a subquery doesn't 
force the outer query into immediate execution—deferred execution still holds for 
the outer query. This is because subqueries are called indirectly—through a delegate 
in the case of a local query, or through an expression tree in the case of an inter- 
preted query. 


An interesting case arises when you include a subquery within a Select expression. 
In the case of a local query, youre actually projecting a sequence of queries—each 
itself subject to deferred execution. The effect is generally transparent, and it serves 
to further improve efficiency. We revisit Select subqueries in some detail in 
Chapter 9. 
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Composition Strategies 


In this section, we describe three strategies for building more complex queries: 


e Progressive query construction 
¢ Using the into keyword 
e Wrapping queries 


All are chaining strategies and produce identical runtime queries. 


Progressive Query Building 
At the start of the chapter, we demonstrated how you could build a fluent query 
progressively: 


var filtered = names .Where (n => n.Contains ("a")); 
var sorted filtered .OrderBy (n => n); 
var query sorted .Select (n => n.ToUpper()); 


Because each of the participating query operators returns a decorator sequence, the 
resultant query is the same chain or layering of decorators that you would get from 
a single-expression query. There are a couple of potential benefits, however, to 
building queries progressively: 


e It can make queries easier to write. 


e You can add query operators conditionally. For example: 


if (includeFilter) query = query.Where (...) 
This is more efficient than: 
query = query.Where (n => !includeFilter || <expression>) 
because it avoids adding an extra query operator if includeFilter is false. 


A progressive approach is often useful in query comprehensions. To illustrate, 
imagine that we want to remove all vowels from a list of names and then present in 
alphabetical order those whose length is still more than two characters. In fluent 
syntax, we could write this query as a single expression—by projecting before we 
filter: 


IEnumerable<string> query = names 
Select (n => n.Replace ("a", "").Replace ("e", "").Replace ("i", "") 
.Replace ("o", "").Replace ("u", "")) 
-Where (n => n.Length > 2) 
-OrderBy (n => n); 


// Dck 
// Hrry 
// Mry 
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Rather than calling string’s Replace method five times, we 
could remove vowels from a string more efficiently with a reg- 
ular expression: 

n => Regex.Replace (n, "[aeiou]", "") 


string’s Replace method has the advantage, though, of also 
working in database queries. 


Translating this directly into a query expression is troublesome because the select 
clause must come after the where and orderby clauses. And if we rearrange the 
query so as to project last, the result would be different: 


TEnumerable<string> query = 


from n in names 
where n.Length > 2 
orderby n 


select n.Replace ("a", "").Replace ("e", "").Replace ("i", "") 
-Replace ("o", "").Replace ("u", ""); 


// Ock 
// Hurry 
// Jy 
// Mry 
// T™ 


Fortunately, there are a number of ways to get the original result in query syntax. 
The first is by querying progressively: 


IEnumerable<string> query = 
from nin names 
select n.Replace ("a", "").Replace ("e", "").Replace ("i", "") 
.Replace ("o", "").Replace ("u", ""); 


query = from n in query where n.Length > 2 orderby n select n; 


// Dck 
// Hurry 
// Mry 


The into Keyword 


The into keyword is interpreted in two very different ways by 
query expressions, depending on context. The meaning we're 
describing now is for signaling query continuation (the other is 
for signaling a GroupJoin). 


The into keyword lets you “continue” a query after a projection and is a shortcut for 
progressively querying. With into, we can rewrite the preceding query as follows: 
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TEnumerable<string> query = 
from nin names 
select n.Replace ("a", "").Replace ("e", "").Replace ("i", "") 
-Replace ("o", "").Replace ("u", "") 
into noVowel 
where noVowel.Length > 2 orderby noVowel select noVowel; 
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The only place you can use into is after a select or group clause. into restarts a 
query, allowing you to introduce fresh where, orderby, and select clauses. 


Although it’s easiest to think of into as restarting a query 
from the perspective of a query expression, it’s all one query 
when translated to its final fluent form. Hence, there’s no 
intrinsic performance hit with into. Nor do you lose any 
points for its use! 


The equivalent of into in fluent syntax is simply a longer chain of operators. 


Scoping rules 


All range variables are out of scope following an into keyword. The following will 
not compile: 


var query = 
from n1 in names 
select n1.ToUpper() 


into n2 // Only n2 is visible from here on. 
where ni.Contains ("x") // Tllegal: n1 is not in scope. 
select n2; 


To see why, consider how this maps to fluent syntax: 


var query = names 
.Select (n1 => n1.ToUpper()) 
-Where (n2 => ni.Contains ("x")); // Error: ni no longer in scope. 


The original name (n1) is lost by the time the Where filter runs. Where’s input 
sequence contains only uppercase names, so it cannot filter based on n1. 


Wrapping Queries 


A query built progressively can be formulated into a single statement by wrapping 
one query around another. In general terms: 


var tempQuery = tempQueryExpr 
var finalQuery = from ... in tempQuery ... 


can be reformulated as: 
var finalQuery = from ... in (tempQueryExpr) 


Wrapping is semantically identical to progressive query building or using the into 
keyword (without the intermediate variable). The end result in all cases is a linear 
chain of query operators. For example, consider the following query: 


TEnumerable<string> query = 
from nin names 
select n.Replace ("a", "").Replace ("e", "").Replace ("i", "") 
.Replace ("o", "").Replace ("u", ""); 


query = from n in query where n.Length > 2 orderby n select n; 
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Reformulated in wrapped form, it’s the following: 


TEnumerable<string> query = 
from n1 in 


( 


from n2 in names 
select n2.Replace ("a", "").Replace ("e", "").Replace ("i", "") 
-Replace ("o", "").Replace ("u", "") 


) 
where ni1.Length > 2 orderby n1 select n1; 


When converted to fluent syntax, the result is the same linear chain of operators as 
in previous examples: 


IEnumerable<string> query = names 
Select (n => n.Replace ("a", "").Replace ("e", "").Replace ("i", "") 
.Replace ("o", "").Replace ("u", "")) 
-Where (n => n.Length > 2) 
-OrderBy (n => n); 


(The compiler does not emit the final .Select (n => n), because it’s redundant.) 


Wrapped queries can be confusing because they resemble the subqueries we wrote 
earlier. Both have the concept of an inner and outer query. When converted to flu- 
ent syntax, however, you can see that wrapping is simply a strategy for sequentially 
chaining operators. The end result bears no resemblance to a subquery, which 
embeds an inner query within the lambda expression of another. 


Returning to a previous analogy: when wrapping, the inner query amounts to the 
preceding conveyor belts. In contrast, a subquery rides above a conveyor belt and is 
activated upon demand through the conveyor belt's lambda worker (as illustrated in 
Figure 8-7). 


Projection Strategies 


Object Initializers 


So far, all our select clauses have projected scalar element types. With C# object 
initializers, you can project into more complex types. For example, suppose, as a 
first step in a query, we want to strip vowels from a list of names while still retaining 
the original versions alongside, for the benefit of subsequent queries. We can write 
the following class to assist: 


class TempProjectionItem 


{ 
public string Original; // Original name 
public string Vowelless; // Vowel-stripped name 


} 


We then can project into it with object initializers: 
string[] names = { "Tom", "Dick", "Harry", "Mary", "Jay" }; 


IEnumerable<TempProjectionItem> temp = 
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from n in names 
select new TempProjectionItem 


{ 
Original =n, 
Vowelless = n.Replace ("a", "").Replace ("e", "").Replace ("i", "") 
-Replace ("o"" "") Replace Cu"; ny 


35 


The result is of type IEnumerable<TempProjectionItem>, which we can subse- 
quently query: 


TEnumerable<string> query = from item in temp 
where item.Vowelless.Length > 2 
select item.Original; 

// Dick 

// Harry 

// Mary 


Anonymous Types 


Anonymous types allow you to structure your intermediate results without writing 
special classes. We can eliminate the TempProjectionItem class in our previous 
example with anonymous types: 


var intermediate = from n in names 


select new 


{ 
Original =n, 
Vowelless = n.Replace ("a", "").Replace ("e", "").Replace ("i", "") 
-Replace ("o"", "") Replace ca", ny 


35 


TEnumerable<string> query = from item in intermediate 
where item.Vowelless.Length > 2 
select item.Original; 


This gives the same result as the previous example, but without needing to write a 
one-off class. The compiler does the job, instead, generating a temporary class with 
fields that match the structure of our projection. This means, however, that the 
intermediate query has the following type: 


IEnumerable <random-compiler-generated-name> 


The only way we can declare a variable of this type is with the var keyword. In this 
case, var is more than just a clutter reduction device; it’s a necessity. 


We can write the entire query more succinctly with the into keyword: 


var query = from n in names 
select new 
{ 
Original =n, 
Vowelless = n.Replace ("a", "").Replace ("e", "").Replace ("i", "") 
«Replace ("o", "").Replace ("u", "") 
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} 

into temp 

where temp.Vowelless.Length > 2 
select temp.Original; 


Query expressions provide a shortcut for writing this kind of query: the let 
keyword. 


The let Keyword 
The let keyword introduces a new variable alongside the range variable. 


With let, we can write a query extracting strings whose length, excluding vowels, 
exceeds two characters, as follows: 


string[] names = { "Tom", "Dick", "Harry", "Mary", "Jay" }; 


TEnumerable<string> query = 
from n in names 
let vowelless = n.Replace ("a", "").Replace ("e", "").Replace ("i", "") 
-Replace ("o", "").Replace ("u", "") 
where vowelless.Length > 2 
orderby vowelless 
select n; // Thanks to let, n is still in scope. 


The compiler resolves a let clause by projecting into a temporary anonymous type 
that contains both the range variable and the new expression variable. In other 
words, the compiler translates this query into the preceding example. 


let accomplishes two things: 


e It projects new elements alongside existing elements. 


e It allows an expression to be used repeatedly in a query without being 
rewritten. 


The let approach is particularly advantageous in this example because it allows the 
select clause to project either the original name (n) or its vowel-removed version 
(vowelless). 


You can have any number of let statements, before or after a where statement (see 
Figure 8-2). A let statement can reference variables introduced in earlier let state- 
ments (subject to the boundaries imposed by an into clause). let reprojects all exist- 
ing variables transparently. 


A let expression need not evaluate to a scalar type: sometimes it’s useful to have it 
evaluate to a subsequence, for instance. 
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Interpreted Queries 


LINQ provides two parallel architectures: local queries for local object collections, 
and interpreted queries for remote data sources. So far, we've examined the architec- 
ture of local queries, which operate over collections implementing IEnumerable<T>. 
Local queries resolve to query operators in the Enumerable class (by default), which 
in turn resolve to chains of decorator sequences. The delegates that they accept— 
whether expressed in query syntax, fluent syntax, or traditional delegates—are fully 
local to IL code, just like any other C# method. 


By contrast, interpreted queries are descriptive. They operate over sequences that 
implement IQueryable<T>, and they resolve to the query operators in the 
Queryable class, which emit expression trees that are interpreted at runtime. These 
expression trees can be translated, for instance, to SQL queries, allowing you to use 
LINQ to query a database. 


The query operators in Enumerable can actually work with 
IQueryable<T> sequences. The difficulty is that the resultant 
queries always execute locally on the client. This is why a sec- 
ond set of query operators is provided in the Queryable class. 


To write interpreted queries, you need to start with an API that exposes sequences 
of type IQueryable<T>. An example is Microsoft’s Entity Framework Core (EF 
Core), which allows you to query a variety of databases, including SQL Server, Ora- 
cle, MySQL, PostgreSQL, and SQLite. 


It's also possible to generate an IQueryable<T> wrapper around an ordinary enu- 
merable collection by calling the AsQueryable method. We describe AsQueryable in 
“Building Query Expressions” on page 416. 


IQueryable<T> is an extension of IEnumerable<T> with addi- 
tional methods for constructing expression trees. Most of the 
time you can ignore the details of these methods; they’re 
called indirectly by the Framework. “Building Query Expres- 
sions” on page 416 covers IQueryable<T> in more detail. 


To illustrate, let’s create a simple customer table in SQL Server and populate it with 
a few names using the following SQL script: 


create table Customer 
( 

ID int not null primary key, 

Name varchar (30) 
) 
insert Customer values (1, 'Tom') 
insert Customer values (2, 'Dick') 
insert Customer values (3, 'Harry') 
insert Customer values (4, 'Mary') 
insert Customer values (5, 'Jay') 
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With this table in place, we can write an interpreted LINQ query in C# that uses EF 


Core to retrieve customers whose name contains the letter “a,” as follows: 
using System; 
using System.Lingq; 
using Microsoft.EntityFrameworkCore; 


public class Customer 
{ 
public int ID { get; set; } 
public string Name { get; set; } 
} 


// We'll explain the following class in more detail in the next section. 
public class NutshellContext : DbContext 
{ 


public virtual DbSet<Customer> Customers { get; set; } 


protected override void OnConfiguring (DbContextOptionsBuilder builder) 
=> builder.UseSqlServer ("...connection string..."); 


protected override void OnModelCreating (ModelBuilder modelBuilder) 
=> modelBuilder.Entity<Customer>().ToTable ("Customer") 
.HasKey (c => c.ID); 
} 


class Program 
{ 
static void Main() 


{ 


using var dbContext = new NutshellContext(); 


IQueryable<string> query = from c in dbContext.Customers 
where’ c.Name.Contains ("a") 
orderby c.Name.Length 
select c.Name.ToUpper(); 


foreach (string name in query) Console.WriteLine (name); 
} 
} 


EF Core translates this query into the following SQL: 


SELECT UPPER([c].[Name]) 
FROM [Customers] AS [c] 
WHERE CHARINDEX(N'a', [c].[Name]) > 0 





ORDER BY CAST(LEN([c].[Name]) AS int) Oo. 
Here’s the end result: g 5 
3 
// JAY 
// MARY 
// HARRY 
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How Interpreted Queries Work 
Let’s examine how the preceding query is processed. 


First, the compiler converts query syntax to fluent syntax. This is done exactly as 
with local queries: 


IQueryable<string> query = dbContext.customers 
-Where (n => n.Name.Contains ("a")) 
.OrderBy (n => n.Name.Length) 
-Select (n => n.Name.ToUpper()); 


Next, the compiler resolves the query operator methods. Here's where local and 
interpreted queries differ—interpreted queries resolve to query operators in the 
Queryable class instead of the Enumerable class. 


To see why, we need to look at the dbContext.Customers variable, the source upon 
which the entire query builds. dbContext.Customers is of type DbSet<T>, which 
implements IQueryable<T> (a subtype of IEnumerable<T>). This means that the 
compiler has a choice in resolving Where: it could call the extension method in 
Enumerable or the following extension method in Queryable: 


public static IQueryable<TSource> Where<TSource> (this 
IQueryable<TSource> source, Expression <Func<TSource,bool>> predicate) 


The compiler chooses Queryable.Where because its signature is a more specific 
match. 


Queryable.wWhere accepts a predicate wrapped in an Expression<TDelegate> type. 
This instructs the compiler to translate the supplied lambda expression—in other 
words, n=>n.Name.Contains("a")—to an expression tree rather than a compiled 
delegate. An expression tree is an object model based on the types in 
System.Ling.Expressions that can be inspected at runtime (so that EF Core can 
later translate it toa SQL statement). 


Because Queryable.Where also returns IQueryable<T>, the same process follows 
with the OrderBy and Select operators. Figure 8-9 illustrates the end result. In the 
shaded box, there is an expression tree describing the entire query, which can be 
traversed at runtime. 
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Figure 8-9. Interpreted query composition 


Execution 


Interpreted queries follow a deferred execution model—just like local queries. This 
means that the SQL statement is not generated until you start enumerating the 
query. Further, enumerating the same query twice results in the database being 
queried twice. 


Under the hood, interpreted queries differ from local queries in how they execute. 
When you enumerate over an interpreted query, the outermost sequence runs a 
program that traverses the entire expression tree, processing it as a unit. In our 
example, EF Core translates the expression tree to a SQL statement, which it then 
executes, yielding the results as a sequence. 


To work, EF Core needs to understand the schema of the data- 
base. It does this by leveraging conventions, code attributes, 
and a fluent configuration API. We'll explore this in detail 
later in the chapter. 


We said previously that a LINQ query is like a production line. However, when you 
enumerate an IQueryable conveyor belt, it doesn’t start up the whole production 
line, like with a local query. Instead, just the IQueryable belt starts up, with a special 
enumerator that calls upon a production manager. The manager reviews the entire 
production line—which consists not of compiled code, but of dummies (method call 
expressions) with instructions pasted to their foreheads (expression trees). The man- 
ager then traverses all the expressions, in this case transcribing them to a single 
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piece of paper (a SQL statement), which it then executes, feeding the results back to 
the consumer. Only one belt turns; the rest of the production line is a network of 
empty shells, existing just to describe what needs to be done. 


This has some practical implications. For instance, with local queries, you can write 
your own query methods (fairly easily, with iterators) and then use them to supple- 
ment the predefined set. With remote queries, this is difficult, and even undesirable. 
If you wrote a MyWhere extension method accepting IQueryable<T>, it would be like 
putting your own dummy into the production line. The production manager 
wouldn't know what to do with your dummy. Even if you intervened at this stage, 
your solution would be hard-wired to a particular provider, such as EF Core, and 
would not work with other IQueryable implementations. Part of the benefit of hav- 
ing a standard set of methods in Queryable is that they define a standard vocabulary 
for querying any remote collection. As soon as you try to extend the vocabulary, 
youre no longer interoperable. 


Another consequence of this model is that an IQueryable provider might be unable 
to cope with some queries—even if you stick to the standard methods. EF Core is 
limited by the capabilities of the database server; some LINQ queries have no SQL 
translation. If you're familiar with SQL, you'll have a good intuition for what these 
are, although at times you need to experiment to see what causes a runtime error; it 
can be surprising what does work! 


Combining Interpreted and Local Queries 


A query can include both interpreted and local operators. A typical pattern is to 
have the local operators on the outside and the interpreted components on the 
inside; in other words, the interpreted queries feed the local queries. This pattern 
works well when querying a database. 


For instance, suppose that we write a custom extension method to pair up strings in 
a collection: 


public static IEnumerable<string> Pair (this IEnumerable<string> source) 
{ 
string firstHalf = null; 
foreach (string element in source) 
if (firstHalf == null) 
firstHalf = element; 
else 


{ 


yield return firstHalf + ", + element; 
firstHalf = null; 
} 
} 


We can use this extension method in a query that mixes EF Core and local 
operators: 


using var dbContext = new NutshellContext (); 
TEnumerable<string> q = dbContext.Customers 
.Select (c => c.Name.ToUpper()) 
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.OrderBy (n => n) 
-Pair() // Local from this point on. 
Select ((n, i) => "Pair " + i.ToString() + "=" +n); 


foreach (string element in q) Console.WriteLine (element); 


// Pair @ = DICK, HARRY 
// Pair 1 = JAY, MARY 


Because dbContext.Customers is of a type implementing IQueryable<T>, the 
Select operator resolves to Queryable.Select. This returns an output sequence 
also of type IQueryable<T>, so the OrderBy operator similarly resolves to 
Queryable.OrderBy. But the next query operator, Pair, has no overload accepting 
IQueryable<T>—only the less specific IEnumerable<T>. So, it resolves to our local 
Pair method—wrapping the interpreted query in a local query. Pair also returns 
IEnumerabLe, so the Select that follows resolves to another local operator. 


On the EF Core side, the resulting SQL statement is equivalent to this: 
SELECT UPPER([c].[Name]) FROM [Customers] AS [c] ORDER BY UPPER([c].[Name]) 


The remaining work is done locally. In effect, we end up with a local query (on the 
outside) whose source is an interpreted query (the inside). 


AsEnumerable 


Enumerable.AsEnumer able is the simplest of all query operators. Here’s its complete 
definition: 


public static IEnumerable<TSource> AsEnumerable<TSource> 
(this IEnumerable<TSource> source) 


{ 


return source; 


} 


Its purpose is to cast an IQueryable<T> sequence to IEnumerable<T>, forcing subse- 
quent query operators to bind to Enumerable operators instead of Queryable opera- 
tors. This causes the remainder of the query to execute locally. 


To illustrate, suppose that we had a MedicalArticles table in SQL Server and 
wanted to use EF Core to retrieve all articles on influenza whose abstract contained 
fewer than 100 words. For the latter predicate, we need a regular expression: 


Regex wordCounter = new Regex (@"\b(\w|[-'])+\b"); 
using var dbContext = new NutshellContext (); 


var query = dbContext.MedicalArticles 
.Where (article => article.Topic == "influenza" && 
wordCounter.Matches (article.Abstract).Count < 100); 


The problem is that SQL Server doesn’t support regular expressions, so EF Core will 
throw an exception, complaining that the query cannot be translated to SQL. We 
can solve this by querying in two steps: first retrieving all articles on influenza 
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through an EF Core query, and then filtering locally for abstracts of fewer than 100 
words: 


Regex wordCounter = new Regex (@"\b(\w|[-'])+\b"); 
using var dbContext = new NutshellContext (); 


TEnumerable<MedicalArticle> efQuery = dbContext.MedicalArticles 
-.Where (article => article.Topic == "influenza"); 


TEnumerable<MedicalArticle> localQuery = efQuery 
.Where (article => wordCounter.Matches (article.Abstract).Count < 100); 


Because efQuery is of type IEnumerable<MedicalArticle>, the second query binds 
to the local query operators, forcing that part of the filtering to run on the client. 


With AsEnumerable, we can do the same in a single query: 


Regex wordCounter = new Regex (@"\b(\w|[-'])+\b"); 
using var dbContext = new NutshellContext (); 


var query = dbContext.MedicalArticles 
.Where (article => article.Topic == "influenza") 


.AsEnumerable() 
-Where (article => wordCounter.Matches (article.Abstract).Count < 100); 


An alternative to calling AsEnumerable is to call ToArray or ToList. The advantage 
of AsEnumerable is that it doesn’t force immediate query execution, nor does it cre- 
ate any storage structure. 


Moving query processing from the database server to the cli- 
ent can hurt performance, especially if it means retrieving 
more rows. A more efficient (though more complex) way to 
solve our example would be to use SQL CLR integration to 
expose a function on the database that implemented the regu- 
lar expression. 


We further demonstrate combined interpreted and local queries in Chapter 10. 


EF Core 


Throughout this and Chapter 9, we use EF Core to demonstrate interpreted queries. 
Let’s now examine the key features of this technology. 


EF Core Entity Classes 


EF Core lets you use any class to represent data, as long as it contains a public prop- 
erty for each column that you want to query. 


For instance, we could define the following entity class to query and update a Cus- 
tomers table in the database: 
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public class Customer 


{ 
public int ID { get; set; } 
public string Name { get; set; } 
} 


DbContext 


After defining entity classes, the next step is to subclass DbContext. An instance of 
that class represents your sessions working with the database. Typically, your 
DbContext subclass will contain one DbSet<T> property for each entity in your 
model: 


public class NutshellContext : DbContext 
{ 


public DbSet<Customer> Customers { get; set; } 
. properties for other tables ... 


i 
A DbContext object does three things: 


e It acts as a factory for generating DbSet<> objects that you can query. 


¢ It keeps track of any changes that you make to your entities so that you can 
write them back (see “Updates”). 


¢ It provides virtual methods that you can override to configure the connection 
and model. 


Configuring the connection 


By overriding the OnConfiguring method, you can specify the database provider 
and connection string: 


public class NutshellContext : DbContext 
{ 


protected override void OnConfiguring (DbContextOptionsBuilder 
optionsBuilder) => 
optionsBuilder .UseSqlServer 
(@"Server=(Local) ; Database=Nutshel1; Trusted_Connection=True' ) ; 


} 


In this example, the connection string is specified as a string literal. Production 
applications would typically retrieve it from a configuration file such as 
appsettings.json. 


Pr 
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UseSqlServer is an extension method defined in an assembly that’s part of the 
Microsoft.EntityFramework.SqlServer NuGet package. Packages are available for 
other database providers, including Oracle, MySQL, PostgresSQL, and SQLite. 
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If youre using ASP.NET, you can allow its dependency injec- 
tion framework to preconfigure optionsBuilder; in most 
cases, this lets you avoid overriding OnConfiguring altogether. 
To enable this, define a constructor on DbContext as follows: 

public NutshellContext (DbContextOptions<NutshellContext> 

options) 
: base(options) { } 

If you do choose to override OnConfiguring (perhaps to pro- 
vide a configuration if your DbContext is used in another sce- 
nario), you can check whether options have already been 
configured as follows: 

protected override void OnConfiguring ( 


DbContextOptionsBuilder optionsBuilder ) 


{ 
if (!optionsBuilder.IsConfigured) 


{ 


= 
} 

In the OnConfiguring method, you can enable other options, including lazy loading 

(see “Lazy loading” on page 414). 


Configuring the model 


By default, EF Core is convention based, meaning that it infers the database schema 
from your class and property names. 


You can override the defaults using the fluent api by overriding OnModelCreating 
and calling extension methods on the ModelBuilder parameter. For example, we 
can explicitly specify the database table name for our Customer entity as follows: 


protected override void OnModelCreating (ModelBuilder modelBuilder) => 
modelBuilder .Entity<Customer>() 
.ToTable ("Customer"); // Table is called 'Customer' 


Without this code, EF Core would map this entity to a table named Customers 
rather than Customer because we have a DbSet<Customer> property in our Db 
Context called Customers: 


public DbSet<Customer> Customers { get; set; } 
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The following code maps all of your entities to table names 
that match the entity class name (which is typically singular) 
rather than the DbSet<T> property name (which is typically 
plural): 

protected override void OnModelCreating (ModelBuilder 


modelBuilder ) 


{ 
foreach (IMutableEntityType entityType in 


modeLBuilder .Model.GetEntityTypes() ) 
{ 
modelBuilder.Entity (entityType.Name) 
-ToTable (entityType.ClrType.Name) ; 
} 
} 


The fluent API offers an expanded syntax for configuring columns. In the next 
example, we use two popular methods: 


e HasCoLumnName, which maps a property to a differently named column 


e IsRequired, which indicates that a column is not nullable 


protected override void OnModelCreating (ModelBuilder modelBuilder) => 
modelBuilder.Entity<Customer> (entity => 


{ 
entity.ToTable ("Customer"); 
entity.Property (e => e.Name) 
-HasColumnName ("Full Name") // Column name is ‘Full Name' 
.IsRequired(); // Column is not nullable 
}); 


Table 8-1 lists some of the most important methods in the fluent API. 


Instead of using the fluent API, you can configure your model 
by applying special attributes to your entity classes and prop- 
erties (“data annotations”). This approach is less flexible in 
that the configuration must be fixed at compile-time, and less 
powerful in that there are some options that can be configured 
only via the fluent API. 


Table 8-1. Fluent API model configuration methods 








Method Purpose Example 
ToTable() Specify the database table builder 
name for a given entity -Entity<Customer>() ° 
.ToTable("Customer"); 5 - 
74 
i) 
HasColumnName() Specify the column name builder. Entity<Customer>() 3 9 
for a given property -Property(c => c.Name) 
.HasCoLlumnName(""Full Name"); 
HasKey(p) Specify a key (usually that builder. Entity<Customer>() 
deviates from convention) -HasKey(c => c.CustomerNr); 
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Method 


Purpose 
IsRequired() Specify that the property 

requires a value (is not 

nullable) 
HasMaxLength() — Specify the maximum 


length of a variable-length 
type (usually a string) 
whose width can vary 


HasCoLumnType() Specify the database data 
type for a column 


Example 


builder. Entity<Customer>() 
.Property(c => c.Name) 
. IsRequired(); 


builder. Entity<Customer>() 
.Property(c => c.Name) 
.HasMaxLength(60) ; 


builder. Entity<Purchase>() 
.Property(p => p.Description) 
.HasCoLumnType("varchar(80)") ; 





Ignore() Ignore a type builder. Ignore<Products>(); 
Ignore() Ignore a property of a type builder. Entity<Customer>() 
.Ignore(c => c.ChatName) ; 
HasIndex() Specify a property (or // Compound index: 
combination of properties) builder .Entity<Purchase>() 
should serve in the -HasIndex(p => 
database as an index new { p.Date, p.Price }); 
// Unique index on one property 
builder 
.Entity<MedicalArticle>() 
.HasIndex(a => a.Topic) 
. IsUnique(); 
HasOne() See “Navigation builder. Entity<Purchase>() 
Properties” on page 412 -HasOne(p => p.Customer) 
.WithMany(c => c.Purchases); 
HasMany( ) See “Navigation builder. Entity<Customer>() 
Properties” on page 412 .HasMany(c => c.Purchases) 
.WithOne(p => p.Customer); 
Creating the database 


EF Core supports a code-first approach, which means that you can start by defining 
entity classes and then ask EF Core to create the database. The easiest way to do the 
latter is to call the following method on a DbContext instance: 


dbContext.Database.EnsureCreated(); 


A better approach, however, is to use EF Core’s migrations feature, which not only 
creates the database, but also configures it such that EF Core can automatically 
update the schema in the future when your entity classes change. You can enable 
migrations in Visual Studio’s Package Manager Console and ask it to create the data- 
base with the following commands: 
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Install-Package Microsoft.EntityFrameworkCore. Tools 
Add-Migration InitialCreate 
Update-Database 


The first command installs tools to manage EF Core from within Visual Studio. The 
second command generates a special C# class known as a code migration that con- 
tains instructions to create the database. The final command runs those instructions 
against the database connection string specified in the project’s application configu- 
ration file. 


Using DbContext 


After you've defined entity classes and subclassed DbContext, you can instantiate 
your DbContext and query the database, as follows: 


using var dbContext = new NutshellContext(); 
Console.WriteLine (dbContext.Customers.Count()); 
// Executes "SELECT COUNT(*) FROM [Customer] AS [c]" 


You can also use your DbContext instance to write to the database. The following 
code inserts a row into the Customer table: 


using var dbContext = new NutshellContext(); 
Customer cust = new Customer() 


{ 


Name = "Sara Wells" 
35 
dbContext.Customers.Add (cust); 
dbContext.SaveChanges(); // Writes changes back to database 


The following queries the database for the customer that was just inserted: 


using var dbContext = new NutshellContext(); 
Customer cust = dbContext.Customers 
.Single (c => c.Name == "Sara Wells") 


The following updates that customer’s name and writes the change to the database: 


cust.Name = "Dr. Sara Wells"; 
dbContext.SaveChanges(); 


The Single operator is ideal for retrieving a row by primary 
key. Unlike First, it throws an exception if more than one ele- 
ment is returned. 





Disposing DbContext 


Although DbContext implements IDisposable, you can (in general) get away without 
disposing instances. Disposing forces the context’s connection to dispose—but this 
is usually unnecessary because EF Core closes connections automatically whenever 
you finish retrieving results from a query. 


Disposing a context prematurely can actually be problematic because of lazy evalua- 
tion. Consider the following: 
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IQueryable<Customer> GetCustomers (string prefix) 
{ 
using (var dbContext = new NutshellContext ()) 
return dbContext.Customers 
-Where (c => c.Name.StartsWith (prefix)); 
} 


foreach (Customer c in GetCustomers ("a")) 
Console.WriteLine (c.Name); 


This will fail because the query is evaluated when we enumerate it—which is after 
disposing its DbContext. 


There are some caveats, though, on not disposing contexts: 


« It relies on the connection object releasing all unmanaged resources on the 
Close method. Even though this holds true with SqlConnection, it’s theoreti- 
cally possible for a third-party connection to keep resources open if you call 
Close but not Dispose (though this would arguably violate the contract defined 
by IDbConnection.Close). 


¢ If you manually call GetEnumerator on a query (instead of using foreach) and 
then fail to either dispose the enumerator or consume the sequence, the con- 
nection will remain open. Disposing the DbContext provides a backup in such 
scenarios. 


« Some people feel that it’s tidier to dispose contexts (and all objects that imple- 
ment IDisposable). 


If you want to explicitly dispose contexts, you must pass a DbContext instance into 
methods such as GetCustomers to avoid the problem described. 


In scenarios such as ASP.NET Core MVC where the context instance is provided via 
dependency injection (DI), the DI infrastructure will manage the context lifetime. It 
will be created when a unit of work (such as an HTTP request processed in the con- 
troller) begins and disposed when that unit of work ends. 











Object Tracking 


A DbContext instance keeps track of all the entities it instantiates, so it can feed the 
same ones back to you whenever you request the same rows in a table. In other 
words, a context in its lifetime will never emit two separate entities that refer to the 
same row in a table (where a row is identified by primary key). This capability is 
called object tracking. 


To illustrate, suppose the customer whose name is alphabetically first also has the 
lowest ID. In the following example, a and b will reference the same object: 


using var dbContext = new NutshellContext (); 


Customer a = dbContext.Customers.OrderBy (c => c.Name).First(); 
Customer b = dbContext.Customers.OrderBy (c => c.ID).First(); 
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Consider what happens when EF Core encounters the second query. It starts by 
querying the database—and obtaining a single row. It then reads the primary key of 
this row and performs a lookup in the context’s entity cache. Seeing a match, it 
returns the existing object without updating any values. So, if another user had just 
updated that customer’s Name in the database, the new value would be ignored. This 
is essential for avoiding unexpected side effects (the Customer object could be in use 
elsewhere) and also for managing concurrency. If you had altered properties on the 
Customer object and not yet called SaveChanges, you wouldn't want your properties 
automatically overwritten. 


You can disable object tracking by chaining the AsNoTracking 
extension method to your query or by setting Change 

Tracker .QueryTrackingBehavior on the context to Query 

TrackingBehavior.NoTracking. No-tracking queries are 
useful when data is used read-only as they improve perfor- 
mance and reduce memory use. 


To get fresh information from the database, you must either instantiate a new con- 
text or call the Reload method, as follows: 


dbContext.Entry (myCustomer).Reload(); 


The best practice is to use a fresh DbContext instance per unit of work so that the 
need to manually reload an entity is rare. 


Change Tracking 


When you change a property value in an entity loaded via DbContext, EF Core rec- 
ognizes the change and updates the database accordingly upon calling SaveChanges. 
To do that, it creates a snapshot of the state of entities loaded through your 
DbContext subclass and compares the current state to the original one when 
SaveChanges is called (or when you manually query change tracking, as you'll see in 
a moment). You can enumerate the tracked changes in a DbContext as follows: 


foreach (var e in dbContext.ChangeTracker.Entries()) 


{ 
Console.WriteLine ($"{e.Entity.GetType().FullName} is {e.State}"); 


foreach (var m in e.Members) 
Console.WriteLine ( 
$s" {m.Metadata.Name}: '{m.CurrentValue}' modified: {m.IsModified}"); 


} 


When you call SaveChanges, EF Core uses the information in the ChangeTracker to 
construct SQL statements that will update the database to match the changes in your 
objects, issuing insert statements to add new rows, update statements to modify 
data, and delete statements to remove rows that were removed from the object 
graph in your DbContext subclass. Any TransactionScope is honored; if none is 
present it wraps all statements in a new transaction. 


You can optimize change tracking by implementing INotifyPropertyChanged and, 
optionally, INotifyPropertyChanging in your entities. The former allows EF Core 
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to avoid the overhead of comparing modified with original entities; the latter allows 
EF Core to avoid storing the original values altogether. After implementing these 
interfaces, call the HasChangeTrackingStrategy method on the ModelBuilder 
when configuring the model in order to activate the optimized change tracking. 


Navigation Properties 


Navigation properties allow you to do the following: 


¢ Query related tables without having to manually join 


¢ Insert, remove, and update related rows without explicitly updating foreign 
keys 


For example, suppose that a customer can have a number of purchases. We can rep- 
resent a one-to-many relationship between Customer and Purchase with the follow- 
ing entities: 


public class Customer 


{ 
public int ID { get; set; } 
public string Name { get; set; } 


// Child navigation property, which must be of type ICollection<T>: 
public virtual List<Purchase> Purchases {get;set;} = new List<Purchase>(); 


} 


public class Purchase 


{ 
public int ID { get; set; } 
public DateTime Date { get; set; } 
public string Description { get; set; } 
public decimal Price { get; set; } 
public int CustomerID? { get; set; } // Foreign key field 


public Customer Customer { get; set; } // Parent navigation property 


} 


EF Core is able to infer from these entities that CustomerID is a foreign key to the 
Customer table, because the name “CustomerID” follows a popular naming conven- 
tion. If we were to ask EF Core to create a database from these entities, it would cre- 
ate a foreign key constraint between Purchase.CustomerID and Customer . ID. 


If EF Core is unable to infer the relationship, you can config- 
ure it explicitly in the OnModelCreating method as follows: 


modelBuilder . Entity<Purchase>() 
.HasOne (e => e.Customer) 
.WithMany (e => e.Purchases) 
.HasForeignKey (e => e.CustomerID); 


With these navigation properties set up, we can write queries such as this: 


var customersWithPurchases = Customers.Where (c => c.Purchases.Any()); 
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We cover how to write such queries in detail in Chapter 9. 


Adding and removing entities from navigation collections 


When you add new entities to a collection navigation property, EF Core automati- 
cally populates the foreign keys upon calling SaveChanges: 


Customer cust = dbContext.Customers.Single (c => c.ID == 1); 


Purchase p1 = new Purchase { Description="Bike", Price=500 }; 
Purchase p2 = new Purchase { Description="Tools", Price=100 }; 


cust.Purchases.Add (p1); 
cust.Purchases.Add (p2); 


dbContext.SaveChanges(); 


In this example, EF Core automatically writes 1 into the CustomerID column of each 
of the new purchases and writes the database-generated ID for each purchase to 
Purchase. 1D. 


When you remove an entity from a collection navigation property and call 
SaveChanges, EF Core will either clear the foreign key field or delete the corre- 
sponding row from the database, depending on how the relationship has been con- 
figured or inferred. In this case, we've defined Purchase.CustomerID as a nullable 
integer (so that we can represent purchases without a customer, or cash transac- 
tions), so removing a purchase from a customer would clear its foreign key field 
rather than deleting it from the database. 


Loading navigation properties 


When EF Core populates an entity, it does not (by default) populate its navigation 
properties: 


using var dbContext = new NutshellContext(); 
var cust = dbContext.Customers.First(); 
Console.WriteLine (cust.Purchases.Count) ; // Always 0 


One solution is to use the Include extension method, which instructs EF Core to 
eagerly load navigation properties: 


var cust = dbContext.Customers 
-Include (c => c.Purchases) 
-Where (c => c.ID == 2).First(); 


Another solution is to use a projection. This technique is particularly useful when 
you need to work with only some of the entity properties, because it reduces data 
transfer: 


var custInfo = dbContext.Customers 
.Where (c => c.ID == 2) 
Select (c => new 


{ 


Name = c.Name, 
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Purchases = c.Purchases.Select (p => new { p.Description, p.Price }) 
y) 
.First(); 
Both of these techniques inform EF Core what data you require so that it can be 
fetched in a single database query. It’s also possible to manually instruct EF Core to 
populate a navigation property as needed: 


dbContext.Entry (cust).Collection (b => b.Purchases).Load(); 
// cust.Purchases is now populated. 


This is called explicit loading. Unlike the preceding approaches, this generates an 
extra round trip to the database. 


Lazy loading 


Another approach for loading navigation properties is called lazy loading. When 
enabled, EF Core populates navigation properties on demand, by generating a proxy 
class for each of your entity classes that intercepts attempts to access unloaded navi- 
gation properties. For this to work, each navigation property must be virtual and the 
class it’s defined in must be inheritable (not sealed). Also, the context must not have 
been disposed when the lazy load occurs, so that an additional database request can 
be performed. 


You can enable lazy loading in the OnConfiguring method of your DbContext sub- 
class, as follows: 


protected override void OnConfiguring (DbContextOptionsBuilder 
optionsBuilder ) 


{ 


optionsBuilder 
.UseLazyLoadingProxies() 


} 


(You will also need to add a reference to the Microsoft.EntityFramework 
Core.Proxies NuGet package.) 


The cost of lazy loading is that EF Core must make an additional request to the 
database each time you access an unloaded navigation property. If you make many 
such requests, performance can suffer as a result of excessive round-tripping. 


With lazy loading enabled, the runtime type of your classes is 
a proxy derived from your entity class; for example: 


using var dbContext = new NutshellContext(); 
var cust = dbContext.Customers.First(); 
Console.WriteLine (cust.GetType()); 

// Castle.Proxies.CustomerProxy 
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Deferred Execution 


EF Core queries are subject to deferred execution, just like local queries. This allows 
you to build queries progressively. There is one aspect, however, in which EF Core 
has special deferred execution semantics, and that is when a subquery appears 
within a Select expression. 


With local queries, you get double-deferred execution, because from a functional 
perspective, you're selecting a sequence of queries. So, if you enumerate the outer 
result sequence, but never enumerate the inner sequences, the subquery will never 
execute. 


With EF Core, the subquery is executed at the same time as the main outer query. 
This avoids excessive round-tripping. 


For example, the following query executes in a single round trip upon reaching the 
first foreach statement: 


using var dbContext = new NutshellContext (); 


var query = from c in dbContext.Customers 
select 
from p in c.Purchases 
select new { c.Name, p.Price }; 


foreach (var customerPurchaseResults in query) 
foreach (var namePrice in customerPurchaseResults) 
Console.WriteLine ($"{ namePrice.Name} spent { namePrice.Price}"); 


Any navigation properties that you explicitly project are fully populated in a single 
round trip: 


var query = from c in dbContext.Customers 
select new { c.Name, c.Purchases }; 


foreach (var row in query) 
foreach (Purchase p in row.Purchases) // No extra round-tripping 
Console.WriteLine (row.Name + " spent " + p.Price); 


But if we enumerate a navigation property without first having either eagerly loaded 
or projected, deferred execution rules apply. In the following example, EF Core exe- 
cutes another Purchases query on each loop iteration (assuming lazy loading is 


enabled): 


foreach (Customer c in dbContext.Customers.ToArray()) 
foreach (Purchase p in c.Purchases) // Another SQL round trip 
Console.WriteLine (c.Name + " spent " + p.Price); 


This model is advantageous when you want to selectively execute the inner loop, 
based on a test that can be performed only on the client: 


foreach (Customer c in dbContext.Customers.ToArray()) 
if (myWebService.HasBadCreditHistory (c.ID)) 
foreach (Purchase p in c.Purchases) // Another SQL round trip 
Console.WriteLine (c.Name + " spent " + p.Price); 
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Note the use of ToArray in the previous two queries. By 
default, SQL Server cannot initiate a new query while the 
results of the current query are still being processed. Calling 
ToArray materializes the customers so that additional queries 
can be issued to retrieve purchases per customer. It is possible 
to configure SQL Server to allow multiple active result sets 
(MARS) by appending ;MultipleActiveResultSets=True to 
the database connection string. Use MARS with caution as it 
can mask a chatty database design that could be improved by 
eager loading and/or projecting the required data. 


(In Chapter 9, we explore Select subqueries in more detail, in “Projecting” on page 
423.) 


Building Query Expressions 


So far in this chapter, when we've needed to dynamically compose queries, weve 
done so by conditionally chaining query operators. Although this is adequate in 
many scenarios, sometimes you need to work at a more granular level and dynami- 
cally compose the lambda expressions that feed the operators. 


In this section, we assume the following Product class: 


public class Product 


{ 
public int ID { get; set; } 
public string Description { get; set; } 
public bool Discontinued { get; set; } 
public DateTime LastSale { get; set; } 


} 


Delegates Versus Expression Trees 
Recall that: 


¢ Local queries, which use Enumerable operators, take delegates. 

e Interpreted queries, which use Queryable operators, take expression trees. 
We can see this by comparing the signature of the Where operator in Enumerable 
and Queryable: 


public static IEnumerable<TSource> Where<TSource> (this 
TEnumerable<TSource> source, Func<TSource,bool> predicate) 


public static IQueryable<TSource> Where<TSource> (this 
IQueryable<TSource> source, Expression<Func<TSource,bool>> predicate) 


When embedded within a query, a lambda expression looks identical whether it 
binds to Enumerable’s operators or Queryable’s operators: 


TEnumerable<Product> qi = localProducts.Where (p => !p.Discontinued) ; 
IQueryable<Product> q2 = sqlProducts.Where (p => !p.Discontinued) ; 
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When you assign a lambda expression to an intermediate variable, however, you 
must be explicit on whether to resolve to a delegate (i-e., Func<>) or an expression 
tree (i., Expression<Func<>>). In the following example, predicate1 and 
predicate2 are not interchangeable: 


Func <Product, bool> predicate1 = p => !p.Discontinued; 
TEnumerable<Product> q1 = localProducts.Where (predicate1); 


Expression <Func <Product, bool>> predicate2 = p => !p.Discontinued; 
IQueryable<Product> q2 = sqlProducts.Where (predicate2); 


Compiling expression trees 


You can convert an expression tree to a delegate by calling Compile. This is of par- 
ticular value when writing methods that return reusable expressions. To illustrate, 
let's add a static method to the Product class that returns a predicate evaluating to 
true if a product is not discontinued and has sold in the past 30 days: 


public class Product 


{ 


public static Expression<Func<Product, bool>> IsSelling() 


{ 
return p => !p.Discontinued && p.LastSale > DateTime.Now.AddDays (-30); 


} 
} 


The method just written can be used both in interpreted and in local queries, as 
follows: 


void Test() 
{ 
var dbContext = new NutshellContext(); 
Product[] localProducts = dbContext.Products.ToArray(); 


IQueryable<Product> sqlQuery = 
dbContext.Products.Where (Product.IsSelling()); 


TEnumerable<Product> localQuery = 
localProducts.Where (Product.IsSelling().Compile()); 


} 

.NET does not provide an API to convert in the reverse direc- 

tion, from a delegate to an expression tree. This makes expres- 

sion trees more versatile. 
(9) 
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The AsQueryable operator lets you write whole queries that can run over either local 
or remote sequences: 





IQueryable<Product> FilterSortProducts (IQueryable<Product> input) 
{ 


return from p in input 
where ... 
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orderby ... 
select p; 


} 


void Test() 


{ 
var dbContext = new NutshellContext(); 
Product[] localProducts = dbContext.Products.ToArray(); 


var sqlQuery = FilterSortProducts (dbContext.Products); 
var localQuery = FilterSortProducts (localProducts.AsQueryable()); 


a 


AsQueryable wraps IQueryable<T> clothing around a local sequence so that subse- 
quent query operators resolve to expression trees. When you later enumerate over 
the result, the expression trees are implicitly compiled (at a small performance cost), 
and the local sequence enumerates as it would ordinarily. 


Expression Trees 


We said previously that an implicit conversion from a lambda expression to 
Expression<TDelegate> causes the C# compiler to emit code that builds an expres- 
sion tree. With some programming effort, you can do the same thing manually at 
runtime—in other words, dynamically build an expression tree from scratch. The 
result can be cast to an Expression<TDelegate> and used in EF Core queries, or 
compiled into an ordinary delegate by calling Compile. 


The Expression DOM 


An expression tree is a miniature code DOM. Each node in the tree is represented 
by a type in the System.Linq.Expressions namespace. Figure 8-10 illustrates these 


types. 
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Figure 8-10. Expression types 
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The base class for all nodes is the (nongeneric) Expression class. The generic 
Expression<TDelegate> class actually means typed lambda expression and might 
have been named LambdaExpression<TDelegate> if it wasn't for the clumsiness of 
this: 


LambdaExpresston<Func<Customer ,bool>> f = ... 


Expression<T>s base type is the (nongeneric) LambdaExpression class. 
LamdbaExpression provides type unification for lambda expression trees: any typed 
Expression<T> can be cast to a LambdaExpression. 


The thing that distinguishes LambdaExpressions from ordinary Expressions is that 
lambda expressions have parameters. 


To create an expression tree, don't instantiate node types directly; rather, call static 
methods provided on the Expression class, such as Add, And, Call, Constant, Less 
Than, and so on. 


Figure 8-11 shows the expression tree that the following assignment creates: 


Expression<Func<string, bool>> f = s => s.Length < 5; 








Expression 


ParameterExpression 
Name =“s” 


Type = System.String 














Figure 8-11. Expression tree 


We can demonstrate this as follows: 


Console.WriteLine (f.Body.NodeType) ; // LessThan 
Console.WriteLine (((BinaryExpression) f.Body).Right); // 5 


- 
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Let’s now build this expression from scratch. The principle is that you start from the 
bottom of the tree and work your way up. The bottommost thing in our tree is a 


«9 


ParameterExpression, the lambda expression parameter called “s” of type string: 


ParameterExpression p = Expression.Parameter (typeof (string), "s"); 
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The next step is to build the MemberExpression and ConstantExpression. In the 
former case, we need to access the Length property of our parameter, “s”: 


MemberExpression stringLength = Expression.Property (p, "Length"); 
ConstantExpression five = Expression.Constant (5); 


Next is the LessThan comparison: 
BinaryExpression comparison = Expression.LessThan (stringLength, five); 


The final step is to construct the lambda expression, which links an expression Body 
to a collection of parameters: 


Expression<Func<string, bool>> Lambda 
= Expression.Lambda<Func<string, bool>> (comparison, p); 


A convenient way to test our lambda is by compiling it to a delegate: 


Func<string, bool> runnable = Lambda.Compile(); 


Console.WriteLine (runnable ("kangaroo")); // False 
Console.WriteLine (runnable ("dog")); // True 


The easiest way to determine which expression type to use is 
to examine an existing lambda expression in the Visual Studio 
debugger. 


We continue this discussion online. 
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LINQ Operators 


This chapter describes each of the LINQ query operators. As well as serving as a ref- 
erence, two of the sections, “Projecting” on page 423 and “Joining” on page 423, 
cover a number of conceptual areas: 


¢ Projecting object hierarchies 
¢ Joining with Select, SelectMany, Join, and GroupJoin 


¢ Query expressions with multiple range variables 


All of the examples in this chapter assume that a names array is defined as follows: 
string[] names = { "Tom", "Dick", "Harry", "Mary", "Jay" }; 


Examples that query a database assume that a variable called dbContext is instanti- 
ated as: 


var dbContext = new NutshellContext(); 
where NutshellContext is defined as follows: 


public class NutshellContext : DbContext 
ie 


public DbSet<Customer> Customers { get; set; } 
public DbSet<Purchase> Purchases { get; set; } 


protected override void OnModelCreating(ModelBuilder modelBuilder ) 


modelBuilder.Entity<Customer>(entity => 
{ 
entity. ToTable("Customer") ; 
entity.Property(e => e.Name).IsRequired(); // Column is not nullable 
}); 
modelBuilder.Entity<Purchase>(entity => 
{ 
entity. ToTable("Purchase"); 
entity.Property(e => e.Date).IsRequired(); 
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entity.Property(e => e.Description).IsRequired(); 
}) 
} 
} 


public class Customer 


{ 
public int ID { get; set; } 
public string Name { get; set; } 


public virtual List<Purchase> Purchases { get; set; } 
= new List<Purchase>(); 


} 


public class Purchase 


{ 
public int ID { get; set; } 
public int? CustomerID { get; set; } 
public DateTime Date { get; set; } 
public string Description { get; set; } 
public decimal Price { get; set; } 


public virtual Customer Customer { get; set; } 


All of the examples in this chapter are preloaded into LINQ- 
Pad, along with a sample database with a matching schema. 
You can download LINQPad from http://www.linqpad.net. 


Here are corresponding SQL Server table definitions: 


CREATE TABLE Customer ( 
ID int NOT NULL IDENTITY PRIMARY KEY, 
Name nvarchar(30) NOT NULL 

) 


CREATE TABLE Purchase ( 
ID int NOT NULL IDENTITY PRIMARY KEY, 
CustomerID int NOT NULL REFERENCES Customer(ID), 
Date datetime NOT NULL, 
Description nvarchar(30) NOT NULL, 
Price decimal NOT NULL 

) 


Overview 


In this section, we provide an overview of the standard query operators. They fall 
into three categories: 

e Sequence in, sequence out (sequence—sequence) 

e Sequence in, single element or scalar value out 


¢ Nothing in, sequence out (generation methods) 
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We first present each of the three categories and the query operators they include 
and then we take up each individual query operator in detail. 


Sequence—>Sequence 


Most query operators fall into this category—accepting one or more sequences as 
input and emitting a single output sequence. Figure 9-1 illustrates those operators 
that restructure the shape of the sequences. 





Flat 
s x 
s So Ne, 
OS 
we 2% * 
Relational ——— pees ins ——> Hierarchical 
Select-subquer 











Figure 9-1. Shape-changing operators 


Filtering 
IEnumerable<TSource>—IEnumerable<TSource> 
Returns a subset of the original elements. 


Where, Take, TakeWhile, Skip, SkipWhile, Distinct 


Projecting 
TEnumerable<TSource>—IEnumerable<TResult> 


Transforms each element with a lambda function. SelectMany flattens nested 
sequences; Select and SelectMany perform inner joins, left outer joins, cross joins, 
and non-equi joins with EF Core. 


Select, SelectMany 
Joining 
TEnumerable<TOuter>, IEnumerable<TInner>>IEnumerable<TResult> 


Meshes elements of one sequence with another. Join and GroupJoin operators are 
designed to be efficient with local queries and support inner and left outer joins. 
The Zip operator enumerates two sequences in step, applying a function over each 
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element pair. Rather than naming the type arguments TOuter and TInner, the Zip 
operator names them TFirst and TSecond: 


TEnumerable<TFirst>, IEnumerable<TSecond>>IEnumerable<TResuLt> 


Join, GroupJoin, Zip 


Ordering 
TEnumerable<TSource>— 10rderedEnumerable<TSource> 
Returns a reordering of a sequence. 


OrderBy, OrderByDescending, ThenBy, ThenByDescending, Reverse 


Grouping 
TEnumerable<TSource>— IEnumerable<IGrouping<TKey, TELement>> 
Groups a sequence into subsequences. 


GroupBy 


Set operators 
IEnumerable<TSource>, IEnumerable<TSource>»IEnumerabLe<TSource> 
Takes two same-typed sequences and returns their commonality, sum, or difference. 


Concat, Union, Intersect, Except 


Conversion methods: Import 
IEnumerable—IEnumerable<TResult> 


OfType, Cast 


Conversion methods: Export 
TEnumerable<TSource>—*An array, list, dictionary, lookup, or sequence 


ToArray, ToList, ToDictionary, ToLookup, AsEnumerable, AsQueryable 


Sequence—>Element or Value 


The following query operators accept an input sequence and emit a single element 
or value. 


Element operators 
IEnumerable<TSource>—TSource 


Picks a single element from a sequence. 


First, FirstOrDefault, Last, LastOrDefault, Single, SingleOrDefault, 
ElementAt, ElementAtOrDefault, DefaultIfEmpty 
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Aggregation methods 
IEnumerable<TSource>— scalar 


Performs a computation across a sequence, returning a scalar value (typically a 
number). 


Aggregate, Average, Count, LongCount, Sum, Max, Min 


Quantifiers 


IEnumerable<TSource>— bool 
An aggregation returning true or false. 


ALL, Any, Contains, SequenceEqual 


Void—>Sequence 


In the third and final category are query operators that produce an output sequence 
from scratch. 


Generation methods 
void—IEnumerable<TResult> 
Manufactures a simple sequence. 


Empty, Range, Repeat 


Filtering 


TEnumerable<TSource>—IEnumer abLe<TSource> 


Method Description SQL equivalents 
Where Returns a subset of elements that satisfya © WHERE 
given condition 
Take Returns the first count elements and WHERE ROW _NUMBER()... 
discards the rest orTOP nsubquery 
Skip Ignores the first count elements and WHERE ROW _NUMBER()... 
returns the rest orNOT IN (SELECT TOP nr...) 
TakeWhile Emits elements from the input sequence Exception thrown 
until the predicate is false 
SkipWhile Ignores elements from the input sequence —_ Exception thrown 
until the predicate is false, and then emits 
the rest 
Distinct Returns a sequence that excludes duplicates SELECT DISTINCT... 
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The SQL equivalents column in the reference tables in this 
chapter does not necessarily correspond to what an 
IQueryable implementation such as EF Core will produce. 
Rather, it indicates what you'd typically use to do the same job 
if you were writing the SQL query yourself. Where there is no 
simple translation, the column is left blank. Where there is no 
translation at all, the column reads Exception thrown. 


Enumerable implementation code, when shown, excludes 
checking for null arguments and indexing predicates. 


With each of the filtering methods, you always end up with either the same number 
or fewer elements than you started with. You can never get more! The elements are 
also identical when they come out; they are not transformed in any way. 


Where 


Argument Type 


Source sequence IEnumerable<TSource> 


Predicate TSource => bool or (TSource,int) => bool? 





* Prohibited with LINQ to SQL and Entity Framework 


Query syntax 


where bool-expression 


Enumerable.Where implementation 


The internal implementation of Enumerable.Where, null checking aside, is function- 
ally equivalent to the following: 


public static IEnumerable<TSource> Where<TSource> 
(this IEnumerable<TSource> source, Func <TSource, bool> predicate) 


{ 
foreach (TSource element in source) 
if (predicate (element) ) 
yield return element; 


Overview 
Where returns the elements from the input sequence that satisfy the given predicate. 
For instance: 


string[] names = { "Tom", "Dick", "Harry", "Mary", "Jay" }; 
TEnumerable<string> query = names.Where (name => name.EndsWith ("y")); 


// Harry 
// Mary 
// Jay 
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In query syntax: 


TEnumerable<string> query = from n in names 
where n.EndsWith ("y") 
select n; 


A where clause can appear more than once in a query and be interspersed with let, 
orderby, and join clauses: 

from n in names 

where n.Length > 3 

let u = n.ToUpper() 


where u.EndsWith ("Y") 
select u; 


// HARRY 
// MARY 


Standard C# scoping rules apply to such queries. In other words, you cannot refer 
to a variable prior to declaring it with a range variable or a Let clause. 


Indexed filtering 


Where’s predicate optionally accepts a second argument, of type int. This is fed with 
the position of each element within the input sequence, allowing the predicate to 
use this information in its filtering decision. For example, the following skips every 
second element: 


IEnumerable<string> query = names.Where ((n, i) => i % 2 == 0); 


// Tom 
// Harry 
// Jay 


An exception is thrown if you use indexed filtering in EF Core. 


SQL LIKE comparisons in EF Core 
The following methods on string translate to SQLs LIKE operator: 
Contains, StartsWith, EndsWith 


For instance, c.Name.Contains ("abc") translates to customer.Name LIKE 
'%abc%' (or more accurately, a parameterized version of this). Contains lets you 
compare only against a locally evaluated expression; to compare against another 
column, you must use the EF.Functions.Like method: 


. where EF.Functions.Like (c.Description, "%" + c.Name + "%") 


EF.Functions.Like also lets you perform more complex comparisons (e.g., LIKE 
‘abc%def%'). 
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< and > string comparisons in EF Core 


You can perform order comparison on strings with string’s CompareTo method; this 
maps to SQLs < and > operators: 


dbContext.Purchases.Where (p => p.Description.CompareTo ("C") < 0) 


WHERE xIN(...,...,...) in EF Core 


With EF Core, you can apply the Contains operator to a local collection within a 
filter predicate; for instance: 


string[] chosenOnes = { "Tom", "Jay" }; 
from c in dbContext.Customers 
where chosenOnes.Contains (c.Name) 

This maps to SQLs IN operator; in other words: 


WHERE customer.Name IN ("Tom", "Jay") 


If the local collection is an array of entities or nonscalar types, EF Core might 
instead emit an EXISTS clause. 


Take and Skip 


Argument Type 


Source sequence TEnumerable<TSource> 


Number of elements to take or skip int 





Take emits the first n elements and discards the rest; Skip discards the first n ele- 
ments and emits the rest. The two methods are useful together when implementing 
a web page allowing a user to navigate through a large set of matching records. For 
instance, suppose that a user searches a book database for the term mercury and 
there are 100 matches. The following returns the first 20: 


IQueryable<Book> query = dbContext.Books 
Where (b => b.Title.Contains ("mercury")) 
-OrderBy (b => b.Title) 

-Take (20); 


The next query returns books 21 to 40: 


IQueryable<Book> query = dbContext.Books 
-Where (b => b.Title.Contains ("mercury")) 
.OrderBy (b => b.Title) 

-Skip (20).Take (20); 


EF Core translates Take and Skip to the ROW_NUMBER function in SQL Server 2005, 
or a TOP n subquery in earlier versions of SQL Server. 
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TakeWhile and SkipWhile 


Argument Type 


Source sequence IEnumerable<TSource> 


Predicate TSource => boolor(TSource,int) => bool 





TakeWhile enumerates the input sequence, emitting each item until the given predi- 
cate is false. It then ignores the remaining elements: 


int[] numbers = { 3, 5, 2, 234, 4, 1 }; 
var takeWhileSmall = numbers.TakeWhile (n => n < 100); // (3, 5, 2} 


SkipwWhile enumerates the input sequence, ignoring each item until the given predi- 
cate is false. It then emits the remaining elements: 


int[] numbers = {3, 5, 2, 234, 4, 1 }; 
var skipWhileSmall = numbers.SkipWhile (n => n < 100); // { 234, 4, 1 } 


TakeWhile and SkipwWhile have no translation to SQL and throw an exception if 
used in an EF Core query. 


Distinct 
Distinct returns the input sequence, stripped of duplicates. You can optionally pass 


in a custom equality comparer. The following returns distinct letters in a string: 


char[] distinctLetters = "HelloWorld".Distinct().ToArray(); 
string s = new string (distinctLetters); // HeloWrd 


We can call LINQ methods directly on a string because string implements 
IEnumerable<char>. 


Projecting 


TEnumerable<TSource>— IEnumerable<TResult> 


Method Description SQL equivalents 
Select Transforms each input element with the given lambda SELECT 
expression 
SelectMany Transforms each input element, and then flattens and INNER JOIN, 
concatenates the resultant subsequences LEFT OUTER JOIN, 
CROSS JOIN 





When querying a database, Select and SelectMany are the 
most versatile joining constructs; for local queries, Join and 
GroupJoin are the most efficient joining constructs. 
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Select 


Argument Type 


Source sequence IEnumerable<TSource> 


Result selector TSource => TResultor(TSource,int) => TResult® 





4 Prohibited with EF Core 


Query syntax 


select projection-expression 


Enumerable implementation 


public static IEnumerable<TResult> Select<TSource, TResult> 
(this IEnumerable<TSource> source, Func<TSource,TResult> selector) 


{ 
foreach (TSource element in source) 
yield return selector (element); 


Overview 


With Select, you always get the same number of elements that you started with. 
Each element, however, can be transformed in any manner by the lambda function. 


The following selects the names of all fonts installed on the computer (from 
System.Drawing): 


TEnumerable<string> query = from f in FontFamily.Families 
select f.Name; 


foreach (string name in query) Console.WriteLine (name); 


In this example, the select clause converts a FontFamily object to its name. Here's 
the lambda equivalent: 


IEnumerable<string> query = FontFamily.Families.Select (f => f.Name); 
Select statements are often used to project into anonymous types: 


var query = 
from f in FontFamily.Families 
select new { f.Name, LineSpacing = f.GetLineSpacing (FontStyle.Bold) }; 


A projection with no transformation is sometimes used with query syntax, in order 
to satisfy the requirement that the query end in a select or group clause. The fol- 
lowing selects fonts supporting strikeout: 


TEnumerable<FontFamily> query = 
from f in FontFamily.Families 
where f.IsStyleAvailable (FontStyle.Strikeout) 
select f; 
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foreach (FontFamily ff in query) Console.WriteLine (ff.Name); 


In such cases, the compiler omits the projection when translating to fluent syntax. 


Indexed projection 


The selector expression can optionally accept an integer argument, which acts as 
an indexer, providing the expression with the position of each input in the input 
sequence. This works only with local queries: 


string[] names = { "Tom", "Dick", "Harry", "Mary", "Jay" }; 


TEnumerable<string> query = names 
Select ((s,i) => i+ "=" +s); // { "O=Tom", "1=Dick", ... } 


Select subqueries and object hierarchies 


You can nest a subquery in a select clause to build an object hierarchy. The follow- 
ing example returns a collection describing each directory under Path.GetTemp- 
Path(), with a subcollection of files under each directory: 


string tempPath = Path.GetTempPath(); 
DirectoryInfo[] dirs = new DirectoryInfo (tempPath).GetDirectories(); 


var query = 
from d in dirs 
where (d.Attributes & FileAttributes.System) == 
select new 
{ 
DirectoryName = d.FullName, 
Created = d.CreationTime, 


Files = from f in d.GetFiles() 
where (f.Attributes & FileAttributes.Hidden) == 0 
select new { FileName = f.Name, f.Length, } 


}; 


foreach (var dirFiles in query) 
{ 
Console.WriteLine ("Directory: + dirFiles.DirectoryName) ; 
foreach (var file in dirFiles.Files) 
Console.WriteLine (" " + file.FileName + " Len: " + file.Length); 


" 


} 


The inner portion of this query can be called a correlated subquery. A subquery is 
correlated if it references an object in the outer query—in this case, it references d, 
the directory being enumerated. 


A subquery inside a Select allows you to map one object 
hierarchy to another, or map a relational object model to a 
hierarchical object model. 
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With local queries, a subquery within a Select causes double-deferred execution. In 
our example, the files aren't filtered or projected until the inner foreach statement 
enumerates. 


Subqueries and joins in EF Core 


Subquery projections work well in EF Core and you can use them to do the work of 
SQL-style joins. Here’s how we retrieve each customer’s name along with their high- 
value purchases: 


var query = 
from c in dbContext.Customers 
select new { 
c.Name, 
Purchases = (from p in dbContext.Purchases 
where p.CustomerID == c.ID && p.Price > 1000 
select new { p.Description, p.Price }) 
-ToList() 
}; 


foreach (var namePurchases in query) 
{ 
Console.WriteLine ("Customer: + namePurchases.Name); 
foreach (var purchaseDetail in namePurchases.Purchases) 
Console.WriteLine ("| - $$$: " + purchaseDetail.Price); 


Note the use of ToList in the subquery. EF Core 3 cannot cre- 
ate queryables from the subquery result when that subquery 
references the DbContext. This issue is being tracked by the 
EF Core team and might be resolved in a future release. 


This query matches up objects from two disparate collections, and it can be thought 
of as a “Join.” The difference between this and a conventional database join (or sub- 
query) is that we're not flattening the output into a single two-dimensional result 
set. We're mapping the relational data to hierarchical data, rather than to flat data. 


This style of query is ideally suited to interpreted queries. The 
outer query and subquery are processed as a unit, avoiding 
unnecessary round-tripping. With local queries, however, it’s 
inefficient because every combination of outer and inner ele- 
ments must be enumerated to get the few matching combina- 
tions. A better choice for local queries is Join or GroupJoin, 
described in the following sections. 


Here's the same query simplified by using the Purchases collection navigation prop- 
erty on the Customer entity: 


from c in dbContext.Customers 
select new 
{ 
c.Name, 
Purchases = from p in c.Purchases // Purchases is List<Purchase> 
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where p.Price > 1000 
select new { p.Description, p.Price } 


t3 
(EF Core 3 does not require ToList when performing the subquery on a navigation 
property.) 


Both queries are analogous to a left outer join in SQL in the sense that we get all 
customers in the outer enumeration, regardless of whether they have any purchases. 
To emulate an inner join—whereby customers without high-value purchases are 
excluded—we would need to add a filter condition on the purchases collection: 


from c in dbContext.Customers 
where c.Purchases.Any (p => p.Price > 1000) 
select new { 
c.Name, 
Purchases = from p in c.Purchases 
where p.Price > 1000 
select new { p.Description, p.Price } 


35 


This is slightly untidy, however, in that we've written the same predicate (Price > 
1000) twice. We can avoid this duplication with a let clause: 


from c in dbContext.Customers 
let highValueP = from p in c.Purchases 
where p.Price > 1000 
select new { p.Description, p.Price } 
where highValueP.Any() 
select new { c.Name, Purchases = highValueP }; 


This style of query is flexible. By changing Any to Count, for instance, we can modify 
the query to retrieve only customers with at least two high-value purchases: 


where highValueP.Count() >= 2 
select new { c.Name, Purchases = highValueP }; 


Projecting into concrete types 


In the examples so far, we've instantiated anonymous types in the output. It can also 
be useful to instantiate (ordinary) named classes, which you populate with object 
initializers. Such classes can include custom logic and be passed between methods 
and assemblies without using type information. 


A typical example is a custom business entity. A custom business entity is simply a 
class that you write with some properties but designed to hide lower-level 
(database-related) details. You might exclude foreign key fields from business-entity 
classes, for instance. Assuming that we wrote custom entity classes called Customer 
Entity and PurchaseEntity, here's how we could project into them: 


IQueryable<CustomerEntity> query = 
from c in dbContext.Customers 
select new CustomerEntity 
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{ 


Name = c.Name, 
Purchases = 
(from p in c.Purchases 
where p.Price > 1000 
select new PurchaseEntity { 
Description = p.Description, 
Value = p.Price 


} 
).ToList() 


3; 


// Force query execution, converting output to a more convenient List: 
List<CustomerEntity> result = query.ToList(); 


When created to transfer data between tiers in a program or 
between separate systems, custom business entity classes are 
often called data transfer objects (DTO). DTOs contain no 
business logic. 


Notice that so far, weve not had to use a Join or SelectMany statement. This is 
because we're maintaining the hierarchical shape of the data, as illustrated in 
Figure 9-2. With LINQ, you can often avoid the traditional SQL approach of flatten- 
ing tables into a two-dimensional result set. 
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Figure 9-2. Projecting an object hierarchy 


SelectMany 


Argument Type 


Source sequence IEnumerable<TSource> 


Result selector TSource => IEnumerable<TResult> 
or (TSource,int) => IEnumerable<TResult>* 





4 Prohibited with EF Core 
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Query syntax 


from identifier1 in enumerable-expression1 
from identifier2 in enumerable-expression2 


Enumerable implementation 


public static IEnumerable<TResult> SelectMany<TSource, TResult> 
(IEnumerable<TSource> source, 
Func <TSource,IEnumerable<TResult>> selector) 


{ 
foreach (TSource element in source) 
foreach (TResult subElement in selector (element) ) 
yield return subElement; 


Overview 
SelectMany concatenates subsequences into a single flat output sequence. 


Recall that for each input element, Select yields exactly one output element. In 
contrast, SelectMany yields 0..n output elements. The 0..n elements come from a 
subsequence or child sequence that the lambda expression must emit. 


You can use SelectMany to expand child sequences, flatten nested collections, and 
join two collections into a flat output sequence. Using the conveyor belt analogy, 
SelectMany funnels fresh material onto a conveyor belt. With SelectMany, each 
input element is the trigger for the introduction of fresh material. The fresh material 
is emitted by the selector lambda expression and must be a sequence. In other 
words, the lambda expression must emit a child sequence per input element. The 
final result is a concatenation of the child sequences emitted for each input element. 


Starting with a simple example, suppose that we have the following array of names: 
string[] fullNames = { "Anne Williams", "John Fred Smith", "Sue Green" }; 
that we want to convert to a single flat collection of words—in other words: 
"Anne", "Williams", "John", "Fred", "Smith", "Sue", Green" 


SelectMany is ideal for this task, because we're mapping each input element to a 
variable number of output elements. All we must do is come up with a selector 
expression that converts each input element to a child sequence. string.Split does 
the job nicely: it takes a string and splits it into words, emitting the result as an 
array: 


string testInputElement = "Anne Williams"; 
string[] childSequence = testInputElement.Split(); 


// childSequence is { "Anne", "Williams" }; 


So, here’s our SelectMany query and the result: 
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IEnumerable<string> query = fullNames.SelectMany (name => name.Split()); 


foreach (string name in query) 
Console.Write (name + "|");  // Anne|Williams|John|Fred|Smith|Sue|Green| 


If you replace SelectMany with Select, you get the same 
results in hierarchical form. The following emits a sequence of 
string arrays, requiring nested foreach statements to 
enumerate: 


TEnumerable<string[]> query = 
fullNames.Select (name => name.Split()); 


foreach (string[] stringArray in query) 
foreach (string name in stringArray) 


Console.Write (name + "|"); 
The benefit of SelectMany is that it yields a single flat result 
sequence. 


SelectMany is supported in query syntax and is invoked by having an additional 
generator—in other words, an extra from clause in the query. The from keyword has 
two meanings in query syntax. At the start of a query, it introduces the original 
range variable and input sequence. Anywhere else in the query, it translates to 
SelectMany. Here’s our query in query syntax: 


IEnumerable<string> query = 
from fullName in fullNames 
from name in fullName.Split() // Translates to SelectMany 
select name; 


Note that the additional generator introduces a new range variable—in this case, 
name. The old range variable stays in scope, however, and we can subsequently 
access both. 


Multiple range variables 


In the preceding example, both name and fullName remain in scope until the query 
either ends or reaches an into clause. The extended scope of these variables is the 
killer scenario for query syntax over fluent syntax. 


To illustrate, we can take the preceding query and include fullName in the final 
projection: 


IEnumerable<string> query = 
from fullName in fullNames 
from name in fullName.Split() 
select name + " came from " + fullName; 


Anne came from Anne Williams 
Williams came from Anne Williams 
John came from John Fred Smith 
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Behind the scenes, the compiler must pull some tricks to let you access both vari- 
ables. A good way to appreciate this is to try writing the same query in fluent syntax. 
It's tricky! It becomes yet more difficult if you insert a where or orderby clause 
before projecting: 


from fullName in fullNames 

from name in fullName.Split() 

orderby fullName, name 

select name + " came from " + fullName; 


The problem is that SelectMany emits a flat sequence of child elements—in our 
case, a flat collection of words. The original “outer” element from which it came 
(fullName) is lost. The solution is to “carry” the outer element with each child, in a 
temporary anonymous type: 


from fullName in fullNames 

from x in fullName.Split().Select (name => new { name, fullName } ) 
orderby x.fullName, x.name 

select x.name + " came from " + x.fullName; 


The only change here is that we’re wrapping each child element (name) in an anony- 
mous type that also contains its fullName. This is similar to how a let clause is 
resolved. Here's the final conversion to fluent syntax: 


IEnumerable<string> query = fullNames 
.SelectMany (fName => fName.Split() 
.Select (name => new { name, fName } )) 
.OrderBy (x => x.fName) 
.ThenBy (x => x.name) 
.Select (x => x.name + 


came from " + x.fName); 


Thinking in query syntax 


As we just demonstrated, there are good reasons to use query syntax if you need 
multiple range variables. In such cases, it helps not only to use query syntax, but 
also to think directly in its terms. 


There are two basic patterns when writing additional generators. The first is 
expanding and flattening subsequences. To do this, you call a property or method on 
an existing range variable in your additional generator. We did this in the previous 
example: 


from fullName in fullNames 
from name in fullName.Split() 


Here, we've expanded from enumerating full names to enumerating words. An anal- 
ogous EF Core query is when you expand collection navigation properties. The fol- 
lowing query lists all customers along with their purchases: 


TEnumerable<string> query = from c in dbContext.Customers 
from p in c.Purchases 
select c.Name + " bought a 


" 


+ p.Description; 


Tom bought a Bike 
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Tom bought a Holiday 
Dick bought a Phone 
Harry bought a Car 


Here, we've expanded each customer into a subsequence of purchases. 


The second pattern is performing a cartesian product, or cross join, in which every 
element of one sequence is matched with every element of another. To do this, 
introduce a generator whose selector expression returns a sequence unrelated to a 
range variable: 


int[] numbers = { 1, 2, 3 }; string[] letters = { "a", "b" }; 


TEnumerable<string> query = from n in numbers 
from 1 in letters 
select n.ToString() + 1; 


// RESULT: { "1a", "1b", "2a", "2b", "3a", "3b" } 


This style of query is the basis of SelectMany-style joins. 


Joining with SelectMany 


You can use SelectMany to join two sequences simply by filtering the results of a 
cross product. For instance, suppose that we want to match players for a game. We 
could start as follows: 


string[] players = { "Tom", "Jay", "Mary" }; 


TEnumerable<string> query = from name1 in players 
from name2 in players 
select name1 + " vs 


+ name2; 


//RESULT: { "Tom vs Tom", "Tom vs Jay", "Tom vs Mary", 
// "Jay vs Tom", "Jay vs Jay", "Jay vs Mary", 
// "Mary vs Tom", "Mary vs "Jay", "Mary vs Mary" } 


The query reads: “For every player, reiterate every player, selecting player 1 versus 
player 2.” Although we got what we asked for (a cross join), the results are not useful 
until we add a filter: 


TEnumerable<string> query = from name1 in players 
from name2 in players 
where name1.CompareTo (name2) < 0 
orderby name1, name2 
select name1 + " vs 


" 


+ name2; 


//RESULT: { "Jay vs Mary", "Jay vs Tom", "Mary vs Tom" } 


The filter predicate constitutes the join condition. Our query can be called a non- 
equi join, because the join condition doesn't use an equality operator. 
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SelectMany in EF Core 


SelectMany in EF Core can perform cross joins, non-equi joins, inner joins, and left 
outer joins. You can use SelectMany with both predefined associations and ad hoc 
relationships—just as with Select. The difference is that SelectMany returns a flat 
rather than a hierarchical result set. 


An EF Core cross join is written just as in the preceding section. The following 
query matches every customer to every purchase (a cross join): 


var query = from c in dbContext.Customers 
from p in dbContext.Purchases 
select c.Name + " might have bought a 


+ p.Description; 


More typically, though, youd want to match customers to only their own purchases. 
You achieve this by adding a where clause with a joining predicate. This results in a 
standard SQL-style equi-join: 


var query = from c in dbContext.Customers 
from p in dbContext.Purchases 
where c.ID == p.CustomerID 
select c.Name + " bought a 


+ p.Description; 


This translates well to SQL. In the next section, we see how it 
extends to support outer joins. Reformulating such queries 
with LINQ’s Join operator actually makes them less extensible 
—LINQ is opposite to SQL in this sense. 


If you have collection navigation properties in your entities, you can express the 
same query by expanding the subcollection instead of filtering the cross product: 


from c in dbContext.Customers 
from p in c.Purchases 
select new { c.Name, p.Description }; 


The advantage is that we've eliminated the joining predicate. We've gone from filter- 
ing a cross product to expanding and flattening. 


You can add where clauses to such a query for additional filtering. For instance, if 
we want only customers whose names started with “T? we could filter as follows: 


from c in dbContext.Customers 

where c.Name.StartsWith ("T") 

from p in c.Purchases 

select new { c.Name, p.Description }; 


This EF Core query would work equally well if the where clause were moved one 
line down because the same SQL is generated in both cases. If it is a local query, 
however, moving the where clause down would make it less efficient. With local 
queries, you should filter before joining. 


You can introduce new tables into the mix with additional from clauses. For 
instance, if each purchase had purchase item child rows, you could produce a flat 
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result set of customers with their purchases, each with their purchase detail lines as 
follows: 


from c in dbContext.Customers 

from p in c.Purchases 

from pi in p.PurchaseItems 

select new { c.Name, p.Description, pi.Detail }; 


Each from clause introduces a new child table. To include data from a parent table 
(via a navigation property), you don't add a from clause—you simply navigate to the 
property. For example, if each customer has a salesperson whose name you want to 
query, just do this: 


from c in dbContext.Customers 
select new { Name = c.Name, SalesPerson = c.SalesPerson.Name }; 


You don't use SelectMany in this case, because there’s no subcollection to flatten. 
Parent navigation properties return a single item. 


Outer joins with SelectMany 


We saw previously that a Select subquery yields a result analogous to a left outer 
join. 
from c in dbContext.Customers 
select new { 
c.Name, 
Purchases = from p in c.Purchases 
where p.Price > 1000 
select new { p.Description, p.Price } 
3 
In this example, every outer element (customer) is included, regardless of whether 
the customer has any purchases. But suppose that we rewrite this query with 
SelectMany so that we can obtain a single flat collection rather than a hierarchical 
result set: 


from c in dbContext.Customers 

from p in c.Purchases 

where p.Price > 1000 

select new { c.Name, p.Description, p.Price }; 


In the process of flattening the query, we've switched to an inner join: customers are 
now included only for whom one or more high-value purchases exist. To get a left 
outer join with a flat result set, we must apply the DefaultIfEmpty query operator 
on the inner sequence. This method returns a sequence with a single null element if 
its input sequence has no elements. Here’s such a query, price predicate aside: 


from c in dbContext.Customers 

from p in c.Purchases.DefaultIfEmpty() 

select new { c.Name, p.Description, Price = (decimal?) p.Price }; 
This works perfectly with EF Core, returning all customers—even if they have no 
purchases. But if we were to run this as a local query, it would crash because when p 
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is null, p.Description and p.Price throw a NullReferenceException. We can 
make our query robust in either scenario, as follows: 


from c in dbContext.Customers 
from p in c.Purchases.DefaultIfEmpty() 
select new { 


c.Name, 

Descript = p == null ? null : p.Description, 

Price = p == null ? (decimal?) null : p.Price 
}; 


Let’s now reintroduce the price filter. We cannot use a where clause as we did before, 
because it would execute after DefaultIfEmpty: 


from c in dbContext.Customers 
from p in c.Purchases.DefaultIfEmpty() 
where p.Price > 1000... 


The correct solution is to splice the Where clause before DefaultIfEmpty with a 
subquery: 


from c in dbContext.Customers 
from p in c.Purchases.Where (p => p.Price > 1000).DefaultIfEmpty() 
select new { 


c.Name, 

Descript = p == null ? null : p.Description, 

Price = p == null ? (decimal?) null : p.Price 
}; 


EF Core translates this to a left outer join. This is an effective pattern for writing 
such queries. 


If youre used to writing outer joins in SQL, you might be 
tempted to overlook the simpler option of a Select subquery 
for this style of query in favor of the awkward but familiar 
SQL-centric flat approach. The hierarchical result set from a 
Select subquery is often better suited to outer join-style quer- 
ies because there are no additional nulls to deal with. 


Joining 
Method Description SQL equivalents 
Join Applies a lookup strategy to match elements from two INNER JOIN 
collections, emitting a flat result set 
GroupJoin Similar to Join, but emits a hierarchical result set INNER JOIN, 
LEFT OUTER JOIN 
Zip Enumerates two sequences in step (like a zipper), applying a Exception thrown 


function over each element pair 
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Join and GroupJoin 


TEnumerable<TOuter>, IEnumerable<TInner>— IEnumerable<TResuLlt> 


Join arguments 


Argument Type 


Outer sequence IEnumerable<TOuter> 
Inner sequence TEnumerable<TInner> 
Outer key selector TOuter => TKey 
Inner key selector TInner => TKey 


Result selector (TOuter,TInner) => TResult 





GroupJoin arguments 


Argument Type 


Outer sequence IEnumerable<TOuter> 
Inner sequence TEnumerable<TInner> 
Outer key selector TOuter => TKey 
Inner key selector TInner => TKey 


Result selector (TOuter,IEnumerable<TInner>) => TResult 





Query syntax 


from outer-var in outer-enumerable 
join inner-var in inner-enumerable on outer-key-expr equals inner-key-expr 
[ into identifier ] 


Overview 


Join and GroupJoin mesh two input sequences into a single output sequence. Join 
emits flat output; GroupJoin emits hierarchical output. 


Join and GroupJoin provide an alternative strategy to Select and SelectMany. The 
advantage of Join and GroupJoin is that they execute efficiently over local in- 
memory collections because they first load the inner sequence into a keyed lookup, 
avoiding the need to repeatedly enumerate over every inner element. The disadvan- 
tage is that they offer the equivalent of inner and left outer joins only; cross joins 
and non-equi joins must still be done using Select/SelectMany. With EF Core 
queries, Join and GroupJoin offer no real benefits over Select and SelectMany. 


Table 9-1 summarizes the differences between each of the joining strategies. 
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Table 9-1. Joining strategies 














Strategy Result Local query Inner Left outer Cross Non-equi 
shape efficiency joins joins joins joins 

Select + Flat Bad Yes Yes Yes Yes 
SelectMany 
Select + Select Nested Bad Yes Yes Yes Yes 
Join Flat Good Yes 
GroupJoin Nested Good Yes Yes —_— —_— 
GroupJoin + Flat Good Yes Yes —_— —_— 
SelectMany 

Join 


The Join operator performs an inner join, emitting a flat output sequence. 


The following query lists all customers alongside their purchases without using a 
navigation property: 
IQueryable<string> query = 
from c in dbContext.Customers 


join p in dbContext.Purchases on c.ID equals p.CustomerID 
select c.Name + " bought a " + p.Description; 


The results match what we would get from a SelectMany-style query: 


Tom bought a Bike 
Tom bought a Holiday 
Dick bought a Phone 
Harry bought a Car 


To see the benefit of Join over SelectMany, we must convert this to a local query. 
We can demonstrate this by first copying all customers and purchases to arrays and 
then querying the arrays: 


Customer[] customers = dbContext.Customers.ToArray(); 
Purchase[] purchases = dbContext.Purchases.ToArray(); 
var slowQuery = from c in customers 
from p in purchases where c.ID == p.CustomerID 
select c.Name + " bought a " + p.Description; 


var fastQuery = from c in customers 
join p in purchases on c.ID equals p.CustomerID 
select c.Name + " bought a " + p.Description; 


Although both queries yield the same results, the Join query is considerably faster 
because its implementation in Enumerable preloads the inner collection 
(purchases) into a keyed lookup. 


The query syntax for join can be written in general terms, as follows: 


join inner-var in inner-sequence on outer-key-expr equals inner-key-expr 
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Join operators in LINQ differentiate between the outer sequence and inner sequence. 
Syntactically: 


¢ The outer sequence is the input sequence (in this case, customers). 


e The inner sequence is the new collection you introduce (in this case, 
purchases). 


Join performs inner joins, meaning customers without purchases are excluded 
from the output. With inner joins, you can swap the inner and outer sequences in 
the query and still get the same results: 


from p in purchases // p is now outer 
join c in customers on p.CustomerID equals c.ID // c is now inner 


You can add further join clauses to the same query. If each purchase, for instance, 
has one or more purchase items, you could join the purchase items, as follows: 


from c in customers 
join p in purchases on c.ID equals p.CustomerID // first join 
join pi in purchaseItems on p.ID equals pi.PurchaseID // second join 


purchases acts as the inner sequence in the first join and as the outer sequence in 
the second join. You could obtain the same results (inefficiently) using nested 
foreach statements, as follows: 


foreach (Customer c in customers) 
foreach (Purchase p in purchases) 
if (c.ID == p.CustomerID) 
foreach (PurchaseItem pi in purchaseItems) 
if (p.ID == pi.PurchaseID) 
Console.WriteLine (c.Name + 


wou 


+ p.Price + "," + pi.Detail); 


won 
’ 


In query syntax, variables from earlier joins remain in scope—just as they do with 
SelectMany-style queries. You're also permitted to insert where and let clauses in 
between join clauses. 


Joining on multiple keys 
You can join on multiple keys with anonymous types, as follows: 


from x in sequenceX 
join y in sequenceY on new { K1 = x.Prop1, K2 = x.Prop2 } 
equals new { K1 = y.Prop3, K2 = y.Prop4 } 


For this to work, the two anonymous types must be structured identically. The com- 
piler then implements each with the same internal type, making the joining keys 
compatible. 
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Joining in fluent syntax 
The following query syntax join: 


from c in customers 
join p in purchases on c.ID equals p.CustomerID 
select new { c.Name, p.Description, p.Price }; 


in fluent syntax is as follows: 


customers.Join ( // outer collection 
purchases, // inner collection 
c => c.ID, // outer key selector 
p => p.CustomerID, // inner key selector 
(c, p) => new 
{ c.Name, p.Description, p.Price } // result selector 
)3 


The result selector expression at the end creates each element in the output 
sequence. If you have additional clauses prior to projecting, such as orderby in this 
example: 


from c in customers 
join p in purchases on c.ID equals p.CustomerID 
orderby p.Price 

select c.Name + " bought a 


+ p.Description; 


you must manufacture a temporary anonymous type in the result selector in fluent 
syntax. This keeps both c and p in scope following the join: 


customers.Join ( // outer collection 
purchases, // inner collection 
c => c.ID, // outer key selector 
p => p.CustomerID, // inner key selector 


(c, p) => new { c, p } ) // result selector 
-OrderBy (x => x.p.Price) 
Select (x => x.c.Name + " bought a 


" 


+ x.p.Description); 


Query syntax is usually preferable when joining; it’s less fiddly. 


GroupJoin 


GroupJoin does the same work as Join, but instead of yielding a flat result, it yields 
a hierarchical result, grouped by each outer element. It also allows left outer joins. 
GroupJoin is not currently supported in EF Core. 


The query syntax for GroupJoin is the same as for Join, but is followed by the into 
keyword. 


Here's the most basic example, using a local query: 


Customer[] customers = dbContext.Customers.ToArray(); 
Purchase[] purchases = dbContext.Purchases.ToArray(); 


TEnumerable<IEnumerable<Purchase>> query = 
from c in customers 
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join p in purchases on c.ID equals p.CustomerID 
into custPurchases 
select custPurchases; // custPurchases is a sequence 


An into clause translates to GroupJoin only when it appears 
directly after a join clause. After a select or group clause, it 
means query continuation. The two uses of the into keyword 
are quite different, although they have one feature in com- 
mon: they both introduce a new range variable. 


The result is a sequence of sequences, which we could enumerate as follows: 


foreach (IEnumerable<Purchase> purchaseSequence in query) 
foreach (Purchase p in purchaseSequence) 
Console.WriteLine (p.Description); 


This isn’t very useful, however, because purchaseSequence has no reference to the 
customer. More commonly, youd do this: 


from c in customers 
join p in purchases on c.ID equals p.CustomerID 
into custPurchases 
select new { CustName = c.Name, custPurchases }; 


This gives the same results as the following (inefficient) Select subquery: 


from c in customers 


select new 
{ 

CustName = c.Name, 

custPurchases = purchases.Where (p => c.ID == p.CustomerID) 
}; 


By default, GroupJoin does the equivalent of a left outer join. To get an inner join— 
whereby customers without purchases are excluded—you need to filter on cust 
Purchases: 


from c in customers join p in purchases on c.ID equals p.CustomerID 
into custPurchases 

where custPurchases.Any() 

select ... 


Clauses after a group-join into operate on subsequences of inner child elements, not 
individual child elements. This means that to filter individual purchases, you'd need 
to call Where before joining: 

from c in customers 

join p in purchases.Where (p2 => p2.Price > 1000) 


on c.ID equals p.CustomerID 
into custPurchases ... 


You can construct lambda queries with GroupJoin as you would with Join. 
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Flat outer joins 


You run into a dilemma if you want both an outer join and a flat result set. Group 
Join gives you the outer join; Join gives you the flat result set. The solution is to 
first call GroupJoin, then DefaultIfEmpty on each child sequence, and then finally 
SelectMany on the result: 


from c in customers 
join p in purchases on c.ID equals p.CustomerID into custPurchases 
from cp in custPurchases.DefaultIfEmpty() 


select new 
{ 

CustName = c.Name, 

Price = cp == null ? (decimal?) null : cp.Price 
33 


DefaultIfEmpty emits a sequence with a single null value if a subsequence of pur- 
chases is empty. The second from clause translates to SelectMany. In this role, it 
expands and flattens all the purchase subsequences, concatenating them into a single 
sequence of purchase elements. 


Joining with lookups 


The Join and GroupJoin methods in Enumerable work in two steps. First, they load 
the inner sequence into a lookup. Second, they query the outer sequence in combi- 
nation with the lookup. 


A lookup is a sequence of groupings that can be accessed directly by key. Another 
way to think of it is as a dictionary of sequences—a dictionary that can accept many 
elements under each key (sometimes called a multidictionary). Lookups are read- 
only and defined by the following interface: 


public interface ILookup<TKey,TElement> : 
TEnumerable<IGrouping<TKey, TELement>>, IEnumerable 


{ 
int Count { get; } 
bool Contains (TKey key); 
IEnumerable<TElement> this [TKey key] { get; } 


} 


The joining operators—like other sequence-emitting opera- 
tors—honor deferred or lazy execution semantics. This means 
the lookup is not built until you begin enumerating the output 
sequence (and then the entire lookup is built right then). 


You can create and query lookups manually as an alternative strategy to using the 
joining operators, when dealing with local collections. There are a couple of benefits 
in doing so: 
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¢ You can reuse the same lookup over multiple queries—as well as in ordinary 
imperative code. 


e Querying a lookup is an excellent way of understanding how Join and 
GroupJoin work. 


The ToLookup extension method creates a lookup. The following loads all purchases 
into a lookup—keyed by their CustomerID: 


ILookup<int?,Purchase> purchLookup = 
purchases.ToLookup (p => p.CustomerID, p => p); 


The first argument selects the key; the second argument selects the objects that are 
to be loaded as values into the lookup. 


Reading a lookup is rather like reading a dictionary except that the indexer returns a 
sequence of matching items rather than a single matching item. The following enu- 
merates all purchases made by the customer whose ID is 1: 


foreach (Purchase p in purchLookup [1]) 
Console.WriteLine (p.Description) ; 


With a lookup in place, you can write SelectMany/Select queries that execute as 
efficiently as Join/GroupJoin queries. Join is equivalent to using SelectMany on a 
lookup: 


from c in customers 
from p in purchLookup [c.1D] 
select new { c.Name, p.Description, p.Price }; 


Tom Bike 500 
Tom Holiday 2000 
Dick Bike 600 
Dick Phone 300 


Adding a call to DefaultIfEmpty makes this into an outer join: 


from c in customers 
from p in purchLookup [c.1D].DefaultIfEmpty() 
select new { 


c.Name, 

Descript = p == null ? null : p.Description, 

Price = p == null ? (decimal?) null : p.Price 
}; 


GroupJoin is equivalent to reading the lookup inside a projection: 


from c in customers 
select new { 
CustName = c.Name, 
CustPurchases = purchLookup [c.ID] 


3 
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Enumerable implementations 
Here’ the simplest valid implementation of Enumerable. Join, null checking aside: 


public static IEnumerable <TResult> Join 
<TOuter,TInner,TKey,TResult> ( 


this IEnumerable <TOuter> outer, 
IEnumerable <TInner> inner, 
Func <TOuter, TKey> outerKeySelector, 
Func <TInner,TKey> innerKeySelector, 
Func <TOuter,TInner,TResult> resultSelector) 
{ 
ILookup <TKey, TInner> lookup = inner.ToLookup (innerKeySelector); 
return 


from outerItem in outer 
from innerItem in lookup [outerKeySelector (outerItem) ] 
select resultSelector (outerItem, innerItem); 


} 
GroupJoin’s implementation is like that of Join but simpler: 


public static IEnumerable <TResult> GroupJoin 
<TOuter,TInner,TKey,TResult> ( 


this IEnumerable <TOuter> outer, 
IEnumerable <TInner> inner, 
Func <TOuter, TKey> outerKeySelector, 
Func <TInner,TKey> innerKeySelector, 
Func <TOuter,IEnumerable<TInner>,TResult> resultSelector) 
{ 
ILookup <TKey, TInner> lookup = inner.ToLookup (innerKeySelector); 
return 


from outerItem in outer 
select resultSelector 
(outerItem, Lookup [outerKeySelector (outerItem)]); 
} 


The Zip Operator 
TEnumerable<TFirst>, IEnumerable<TSecond>>IEnumerabLle<TResult> 


The Zip operator was added in Framework 4.0. It enumerates two sequences in step 
(like a zipper), returning a sequence based on applying a function over each element 
pair. For instance, the following: 


int[] numbers = { 3, 5, 7 }; 
string[] words = { "three", "five", "seven", "ignored" }; 
TEnumerable<string> zip = numbers.Zip (words, (n, w) => n + "=" + w); 


produces a sequence with the following elements: 


3=three 
5=five 
7=seven 


Extra elements in either input sequence are ignored. Zip is not supported by EF 
Core. 
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Ordering 


TEnumerable<TSource>—I0rderedEnumerable<TSource> 


Method Description SQL equivalents 
OrderBy, ThenBy Sorts a sequence in ascending order ORDER BY... 
OrderByDescending, Sorts a sequence in descending order ORDER BY... DESC 
ThenByDescending 

Reverse Returns a sequence in reverse order — Exception thrown 





Ordering operators return the same elements in a different order. 
OrderBy, OrderByDescending, ThenBy, and ThenByDescending 


OrderBy and OrderByDescending arguments 


Argument Type 


Input sequence IEnumerable<TSource> 


Key selector TSource => TKey 





Return type = 10rderedEnumerable<TSource> 


ThenBy and ThenByDescending arguments 


Argument Type 


Input sequence IOrderedEnumerable<TSource> 


Key selector TSource => TKey 





Query syntax 


orderby expression1 [descending] [, expression2 [descending] ... ] 


Overview 


OrderBy returns a sorted version of the input sequence, using the keySelector 
expression to make comparisons. The following query emits a sequence of names in 
alphabetical order: 


TEnumerable<string> query = names.OrderBy (s => s); 
The following sorts names by length: 


IEnumerable<string> query = names.OrderBy (s => s.Length); 


// Result: { "Jay", "Tom", "Mary", "Dick", "Harry" }; 
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The relative order of elements with the same sorting key (in this case, Jay/Tom and 
Mary/Dick) is indeterminate—unless you append a ThenBy operator: 


IEnumerable<string> query = names.OrderBy (s => s.Length).ThenBy (s => s); 


// Result: { "Jay", "Tom", "Dick", "Mary", "Harry" }; 


ThenBy reorders only elements that had the same sorting key in the preceding sort. 
You can chain any number of ThenBy operators. The following sorts first by length, 
then by the second character, and finally by the first character: 


names.OrderBy (s => s.Length).ThenBy (s => s[1]).ThenBy (s => s[0]); 
Here's the equivalent in query syntax: 


from s in names 
orderby s.Length, s[1], s[0] 
select s; 


The following variation is incorrect—it will actually order first 
by s[1] and then by s.Length (or in the case of a database 
query, it will order only by s[1] and discard the former 
ordering): 

from s in names 


orderby s.Length 
orderby s[1] 


LINQ also provides OrderByDescending and ThenByDescending operators, which 
do the same things, emitting the results in reverse order. The following EF Core 
query retrieves purchases in descending order of price, with those of the same price 
listed alphabetically: 


dbContext.Purchases.OrderByDescending (p => p.Price) 
.ThenBy (p => p.Description); 


In query syntax: 


from p in dbContext.Purchases 
orderby p.Price descending, p.Description 
select p; 


Comparers and collations 


In a local query, the key selector objects themselves determine the ordering algo- 
rithm via their default IComparable implementation (see Chapter 7). You can over- 
ride the sorting algorithm by passing in an IComparer object. The following 
performs a case-insensitive sort: 


names.OrderBy (n => n, StringComparer.CurrentCultureIgnoreCase) ; 


Passing in a comparer is not supported in query syntax, nor in any way by EF Core. 
When querying a database, the comparison algorithm is determined by the partici- 
pating column’ collation. If the collation is case sensitive, you can request a case- 
insensitive sort by calling ToUpper in the key selector: 
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from p in dbContext.Purchases 
orderby p.Description. ToUpper() 
select p; 


lOrderedEnumerable and lOrderedQueryable 


The ordering operators return special subtypes of IEnumerable<T>. Those in 
Enumerable return IOrderedEnumerable<TSource>; those in Queryable return 
10rderedQueryable<TSource>. These subtypes allow a subsequent ThenBy operator 
to refine rather than replace the existing ordering. 


The additional members that these subtypes define are not publicly exposed, so they 
present like ordinary sequences. The fact that they are different types comes into 
play when building queries progressively: 


IOrderedEnumerable<string> query1 = names.OrderBy (s => s.Length); 
IOrderedEnumerable<string> query2 = query1.ThenBy (s => s); 


If we instead declare query1 of type IEnumerable<string>, the second line would 
not compile—ThenBy requires an input of type IOrderedEnumerable<string>. You 
can avoid worrying about this by implicitly typing range variables: 


var query1 = names.OrderBy (s => s.Length); 
var query2 = query1.ThenBy (s => s); 


Implicit typing can create problems of its own, though. The following will not 
compile: 


var query = names.OrderBy (s => s.Length); 
query = query.Where (n => n.Length > 3); // Compile-time error 


The compiler infers query to be of type I0rderedEnumerable<string>, based on 
OrderBy’s output sequence type. However, the Where on the next line returns an 
ordinary IEnumerable<string>, which cannot be assigned back to query. You can 
work around this either with explicit typing or by calling AsEnumerable() after 
OrderBy: 


var query = names.OrderBy (s => s.Length).AsEnumerable(); 
query = query.Where (n => n.Length > 3); // OK 


The equivalent in interpreted queries is to call AsQueryable. 


Grouping 


IEnumerable<TSource>—+IEnumerable<IGrouping<TKey, TELement>> 


Method _ Description SQL equivalents 


GroupBy Groups a sequence into subsequences GROUP BY 
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GroupBy 


Input sequence TEnumerable<TSource> 
Key selector TSource => TKey 
Element selector (optional) TSource => TElement 


Comparer (optional) IEqualityComparer<TKey> 





Query syntax 


group element-expression by key-expression 


Overview 


GroupBy organizes a flat input sequence into sequences of groups. For example, the 
following organizes all of the files in Path.GetTempPath() by extension: 


string[] files = Directory.GetFiles (Path.GetTempPath()); 


TEnumerable<IGrouping<string,string>> query = 
files.GroupBy (file => Path.GetExtension (file)); 


Or, with implicit typing: 
var query = files.GroupBy (file => Path.GetExtension (file)); 


Here’s how to enumerate the result: 


foreach (IGrouping<string,string> grouping in query) 
{ 
Console.WriteLine ("Extension: " + grouping.Key) ; 
foreach (string filename in grouping) 


Console.WriteLine (" - " + filename); 


} 


Extension: .pdf 

-- chapter03. pdf 

-- chapter04. pdf 
Extension: .doc 

-- todo.doc 

-- menu. doc 

-- Copy of menu.doc 


Enumerable.GroupBy works by reading the input elements into a temporary dictio- 
nary of lists so that all elements with the same key end up in the same sublist. It then 
emits a sequence of groupings. A grouping is a sequence with a Key property: 


public interface IGrouping <TKey,TElement> : IEnumerable<TElement>, 
IEnumerable 


{ 
TKey Key { get; } // Key applies to the subsequence as a whole 


} 
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By default, the elements in each grouping are untransformed input elements unless 
you specify an elementSelector argument. The following projects each input ele- 
ment to uppercase: 


files.GroupBy (file => Path.GetExtension (file), file => file. ToUpper()); 


An elementSelector is independent of the keySelector. In our case, this means 
that the Key on each grouping is still in its original case: 


Extension: .pdf 
-- CHAPTERQ3.PDF 
-- CHAPTERO4. PDF 
Extension: .doc 
-- TODO.DOC 


Note that the subcollections are not emitted in alphabetical order of key. GroupBy 
merely groups; it does not sort. In fact, it preserves the original ordering. To sort, 
you must add an OrderBy operator: 


files.GroupBy (file => Path.GetExtension (file), file => file.ToUpper()) 
.OrderBy (grouping => grouping.Key); 


GroupBy has a simple and direct translation in query syntax: 
group element-expr by key-expr 
Here's our example in query syntax: 


from file in files 
group file.ToUpper() by Path.GetExtension (file); 


As with select, group “ends” a query—unless you add a query continuation clause: 


from file in files 

group file.ToUpper() by Path.GetExtension (file) into grouping 
orderby grouping.Key 

select grouping; 


Query continuations are often useful in a GroupBy query. The next query filters out 
groups that have fewer than five files in them: 


from file in files 

group file.ToUpper() by Path.GetExtension (file) into grouping 
where grouping.Count() >= 5 

select grouping; 


A where after a GroupBy is equivalent to HAVING in SQL. It 
applies to each subsequence or grouping as a whole rather 
than the individual elements. 


Sometimes, youre interested purely in the result of an aggregation on a grouping 
and so can abandon the subsequences: 


string[] votes = { "Dogs", "Cats", "Cats", "Dogs", "Dogs" }; 


TEnumerable<string> query = from vote in votes 
group vote by vote into g 
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orderby g.Count() descending 
select g.Key; 


string winner = query.First(); // Dogs 


GroupBy in EF Core 


Grouping works in the same way when querying a database. If you have navigation 
properties set up, you'll find, however, that the need to group arises less frequently 
than with standard SQL. For instance, to select customers with at least two purcha- 
ses, you don't need to group; the following query does the job nicely: 


from c in dbContext.Customers 
where c.Purchases.Count >= 2 
select c.Name + " has made " 


+ c.Purchases.Count + " purchases"; 


An example of when you might use grouping is to list total sales by year: 


from p in dbContext.Purchases 
group p.Price by p.Date.Year into salesByYear 
select new { 
Year = salesByYear.Key, 
TotalValue = salesByYear .Sum() 
}; 
LINQ’s grouping is more powerful than SQLs GROUP BY in that you can fetch all 
detail rows without any aggregation: 


from p in dbContext.Purchases 
group p by p.Date. Year 
Date. Year 


However, this doesnt work in EF Core. An easy workaround is to 
call .AsEnumerable() just before grouping so that the grouping happens on the cli- 
ent. This is no less efficient as long as you perform any filtering before grouping so 
that you only fetch the data you need from the server. 


Another departure from traditional SQL comes in there being no obligation to 
project the variables or expressions used in grouping or sorting. 


Grouping by multiple keys 
You can group by a composite key, using an anonymous type: 
from n in names 
group n by new { FirstLetter = n[Q], Length = n.Length }; 
Custom equality comparers 


You can pass a custom equality comparer into GroupBy, in a local query, to change 
the algorithm for key comparison. Rarely is this required, though, because changing 
the key selector expression is usually sufficient. For instance, the following creates a 
case-insensitive grouping: 
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Set Operators 


TEnumerable<TSource>, IEnumerable<TSource>—IEnumer able<TSource> 


Method Description SQL equivalents 

Concat Returns a concatenation of elements in each of the two UNION ALL 
sequences 

Union Returns a concatenation of elements in each of the two UNION 


sequences, excluding duplicates 





Intersect Returns elements present in both sequences WHERE ... IN 
(ous) 
Except Returns elements present in the first, but not the second EXCEPT 
sequence or 
WHERE ... NOT IN 
(52) 
Concat and Union 


Concat returns all the elements of the first sequence, followed by all the elements of 
the second. Union does the same but removes any duplicates: 


int[] seqi = { 1, 2, 3 }, seq2 = { 3, 4, 5 }; 


TEnumerable<int> 
concat = seqi.Concat (seq2), dh XS Wg2. 35235-4553 
union = seqi.Union (seq2); dh L8g 23, 35.4505: J 


Specifying the type argument explicitly is useful when the sequences are differently 
typed, but the elements have a common base type. For instance, with the reflection 
API (Chapter 19), methods and properties are represented with MethodInfo and 
PropertyInfo classes, which have a common base class called MemberInfo. We can 
concatenate methods and properties by stating that base class explicitly when calling 
Concat: 


MethodInfo[] methods = typeof (string).GetMethods(); 
PropertyInfo[] props = typeof (string).GetProperties(); 
TEnumerable<MemberInfo> both = methods.Concat<MemberInfo> (props); 


In the next example, we filter the methods before concatenating: 


var methods = typeof (string).GetMethods().Where (m => !m.IsSpecialName) ; 
var props = typeof (string).GetProperties(); 
var both = methods.Concat<MemberInfo> (props); 


This example relies on interface type parameter variance: methods is of type 
IEnumerable<MethodInfo>, which requires a covariant conversion to IEnumerable 
<MemberInfo>. It’s a good illustration of how variance makes things work more like 
youd expect. 
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Intersect and Except 


Intersect returns the elements that two sequences have in common. Except 
returns the elements in the first input sequence that are not present in the second: 


int[] seqi = { 1, 2, 3 }, seq2 = { 3, 4, 5 }; 


IEnumerable<int> 
commonality = seqi.Intersect (seq2), Lk <3} 
difference1 = seqi.Except (seq2), Th Ag 24 
difference2 = seq2.Except (seq1); // 4, 5} 
Enumerable.Except works internally by loading all of the elements in the first col- 
lection into a dictionary and then removing from the dictionary all elements 
present in the second sequence. The equivalent in SQL is a NOT EXISTS or NOT IN 
subquery: 
SELECT number FROM numbersiTable 
WHERE number NOT IN (SELECT number FROM numbers2Table) 


Conversion Methods 


LINQ deals primarily in sequences; in other words, collections of type 
IEnumerable<T>. The conversion methods convert to and from other types of 
collections: 


Method Description 


Of Type Converts IEnumer able to IEnumerable<T>, discarding wrongly typed elements 

Cast Converts IEnumer able to TEnumerable<T>, throwing an exception if there are any 
wrongly typed elements 

ToArray Converts [Enumerable<T> to T[ ] 

ToList Converts IEnumerable<T> to List<T> 


ToDictionary Converts [Enumerable<T> to Dictionary<TKey, TValue> 
ToLookup Converts [Enumerable<T> to ILookup<TKey, TELement> 
AsEnumerable Upcasts to IEnumerable<T> 


AsQueryable Casts or converts to IQueryable<T> 





Offype and Cast 


OfType and Cast accept a nongeneric IEnumerable collection and emit a generic 
IEnumerable<T> sequence that you can subsequently query: 


ArrayList classicList = new ArrayList(); // in System.Collections 
classicList.AddRange ( new int[] { 3, 4, 5 } )3 
TEnumerable<int> sequence1 = classicList.Cast<int>(); 
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Cast and Of Type differ in their behavior when encountering an input element that’s 
of an incompatible type. Cast throws an exception; Of Type ignores the incompatible 
element. Continuing the preceding example: 


DateTime offender = DateTime.Now; 

classicList.Add (offender); 

TEnumerable<int> 
sequence2 = classicList.OfType<int>(), // OK - ignores offending DateTime 
sequence3 = classicList.Cast<int>();  // Throws exception 


The rules for element compatibility exactly follow those of C#’s is operator, and 
therefore consider only reference conversions and unboxing conversions. We can 
see this by examining the internal implementation of Of Type: 


public static IEnumerable<TSource> OfType <TSource> (IEnumerable source) 


{ 
foreach (object element in source) 
if (element is TSource) 
yield return (TSource)element; 


} 


Cast has an identical implementation, except that it omits the type compatibility 
test: 


public static IEnumerable<TSource> Cast <TSource> (IEnumerable source) 


{ 
foreach (object element in source) 
yield return (TSource)element; 


} 


A consequence of these implementations is that you cannot use Cast to perform 
numeric or custom conversions (for these, you must perform a Select operation 
instead). In other words, Cast is not as flexible as C#’s cast operator: 

int i = 3; 

long 1 = i; // Implicit numeric conversion int->long 

int i2 = (int) 1; // Explicit numeric conversion long->int 
We can demonstrate this by attempting to use Of Type or Cast to convert a sequence 
of ints to a sequence of Longs: 


int[] integers = { 1, 2, 3 }; 


IEnumerable<long> test1 = integers.OfType<long>(); 
TEnumerable<long> test2 = integers.Cast<long>(); 


When enumerated, test1 emits zero elements and test2 throws an exception. 
Examining Of Type’s implementation, it’s fairly clear why. After substituting TSource, 
we get the following expression: 


(element is long) 


This returns false for an int element, due to the lack of an inheritance 
relationship. 
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The reason for test2 throwing an exception when enumer- 
ated is subtler. Notice in Cast’s implementation that element 
is of type object. When TSource is a value type, the CLR 
assumes this is an unboxing conversion, and synthesizes a 
method that reproduces the scenario described in the section 
“Boxing and Unboxing” on page 117 in Chapter 3: 


int value = 123; 
object element = value; 
long result = (long) element; // exception 


Because the element variable is declared of type object, an 
object-to-long cast is performed (an unboxing) rather than 
an int-to-long numeric conversion. Unboxing operations 
require an exact type match, so the object-to-long unbox 
fails when given an int. 


As we suggested previously, the solution is to use an ordinary Select: 
IEnumerable<long> castLong = integers.Select (s => (long) s); 


OfType and Cast are also useful in downcasting elements in a generic input 
sequence. For instance, if you have an input sequence of type IEnumerable<Fruit>, 
OfType<Apple> would return just the apples. This is particularly useful in LINQ to 
XML (see Chapter 10). 


Cast has query syntax support: simply precede the range variable with a type: 


from TreeNode node in myTreeView.Nodes 


ToArray, ToList, ToDictionary, ToHashSet, and ToLookup 


ToArray, ToList, and ToHashSet emit the results into an array, List<T> or 
HashSet<T>. When they execute, these operators force the immediate enumeration 
of the input sequence. For examples, refer to “Deferred Execution” on page 415 in 
Chapter 8. 


ToDictionary and ToLookup accept the following arguments: 


Argument Type 


Input sequence TEnumerable<TSource> 
Key selector TSource => TKey 
Element selector (optional) TSource => TElement 


Comparer (optional) IEqualityComparer<TKey> 





ToDictionary also forces immediate execution of a sequence, writing the results to 
a generic Dictionary. The keySelector expression you provide must evaluate to a 
unique value for each element in the input sequence; otherwise, an exception is 
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thrown. In contrast, ToLookup allows many elements of the same key. We described 
lookups in “Joining with lookups” on page 447. 
AsEnumerable and AsQueryable 


AsEnumerable upcasts a sequence to IEnumerable<T>, forcing the compiler to bind 
subsequent query operators to methods in Enumerable instead of Queryable. For an 
example, see “Combining Interpreted and Local Queries” on page 402 in Chapter 8. 


AsQueryable downcasts a sequence to IQueryable<T> if it implements that inter- 
face. Otherwise, it instantiates an IQueryable<T> wrapper over the local query. 


Element Operators 


ITEnumerable<TSource>— TSource 


Method Description SQL equivalents 
First, FirstOrDefault Returns the first element in the sequence, optionally SELECT TOP 1... 
satisfying a predicate ORDER BY... 
Last, Returns the last element in the sequence, optionally SELECT TOP 1... 
LastOrDefault satisfying a predicate ORDER BY... DESC 
Single, Equivalent to First/FirstOrDefault, but 
SingleOrDefault throws an exception if there is more than one match 
Elementat, Returns the element at the specified position Exception thrown 
ElementAtOrDefault 
DefaultIfEmpty Returns a single-element sequence whose value is = OUTER JOIN 
default(TSource) if the sequence has no 
elements 





Methods ending in “OrDefault” return default(TSource) rather than throwing an 
exception if the input sequence is empty or if no elements match the supplied 
predicate. 


default(TSource) is null for reference type elements, false for the bool type, and 
zero for numeric types. 


First, Last, and Single 


Argument Type 


Source sequence TEnumerable<TSource> 


Predicate (optional) TSource => bool 





The following example demonstrates First and Last: 
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int[] numbers = { 1, 2, 3, 4, 5 }; 


int first = numbers.First(); {pa 
int last = numbers.Last(); //'5 
int firstEven = numbers.First (n => n % 2 == 0); // 2 
int lastEven = numbers.Last (n => n % 2 == 0); // 4 


The following demonstrates First versus FirstOrDefault: 


int firstBigError = numbers.First (n => n > 10); // Exception 
int firstBigNumber = numbers.FirstOrDefault (n => n > 10); // 0 


To avoid an exception, Single requires exactly one matching element; 
SingleOrDefault requires one or zero matching elements: 


int onlyDivBy3 = numbers.Single (n =>n%3 == 0); // 3 
int divBy2Err = numbers.Single (n =>n% 2 == 0); // Error: 2 & 4 match 


int singleError = numbers.Single (n => n > 10); // Error 
int noMatches = numbers.SingleOrDefault (n => n > 10); // 0 
int divBy2Error = numbers.SingleOrDefault (n => n % 2 == 0); // Error 


Single is the “fussiest” in this family of element operators. FirstOrDefault and 
LastOrDefault are the most tolerant. 


In EF Core, Single is often used to retrieve a row from a table by primary key: 


Customer cust = dataContext.Customers.Single (c => c.ID == 3); 
ElementAt 
Argument Type 
Source sequence TEnumerable<TSource> 


Index of element to return int 





ElementaAt picks the nth element from the sequence: 


int[] numbers = { 1, 2, 3, 4, 5 }; 


int third = numbers.ElementAt (2); // 3 
int tenthError = numbers.ElementAt (9); // Exception 
int tenth = numbers.ElementAtOrDefault (9); // 9 


Enumerable.ElementAt is written such that if the input sequence happens to imple- 
ment IList<T>, it calls IList<T>’s indexer. Otherwise, it enumerates n times, and 
then returns the next element. ElementAt is not supported in EF Core. 


DefaultlfEmpty 


DefaultIfEmpty returns a sequence containing a single element whose value is 
default(TSource) if the input sequence has no elements; otherwise, it returns the 
input sequence unchanged. You use this in writing flat outer joins: see “Outer joins 
with SelectMany” on page 440 and “Flat outer joins” on page 447. 
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Aggregation Methods 


TEnumerable<TSource>— scalar 





Method Description SQL equivalents 

Count, Long _ Returns the number of elements in the input sequence, COUNT (...) 

Count optionally satisfying a predicate 

Min, Max Returns the smallest or largest element in the sequence MIN (...), MAX 

(sex) 

Sum, Average Calculates a numeric sum or average over elements inthe SUM (...), AVG 
sequence Cova) 

Aggregate Performs a custom aggregation Exception thrown 

Count and LongCount 


Argument Type 


Source sequence TEnumerable<TSource> 


Predicate (optional) TSource => bool 





Count simply enumerates over a sequence, returning the number of items: 
int fullCount = new int[] { 5, 6, 7 }.Count(); // 3 


The internal implementation of Enumerable.Count tests the input sequence to see 
whether it happens to implement ICollection<T>. If it does, it simply calls 
ICollection<T>.Count; otherwise, it enumerates over every item, incrementing a 
counter. 


You can optionally supply a predicate: 
int digitCount = "pa55wOrd".Count (c => char.IsDigit (c)); // 3 


LongCount does the same job as Count, but returns a 64-bit integer, allowing for 
sequences of greater than two billion elements. 


Min and Max 
Argument Type 
Source sequence TEnumerable<TSource> 


Result selector (optional) TSource => TResult 





Min and Max return the smallest or largest element from a sequence: 


int[] numbers = { 28, 32, 14 }; 
int smallest = numbers.Min(); // 14; 
int largest = numbers.Max(); // 32; 
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If you include a selector expression, each element is first projected: 
int smallest = numbers.Max (n => n% 10); // 8; 


A selector expression is mandatory if the items themselves are not intrinsically 
comparable—in other words, if they do not implement IComparable<T>: 


Purchase runtimeError = dbContext.Purchases.Min (); // Error 
decimal? lowestPrice = dbContext.Purchases.Min (p => p.Price); // OK 


A selector expression determines not only how elements are compared, but also 
the final result. In the preceding example, the final result is a decimal value, not a 
purchase object. To get the cheapest purchase, you need a subquery: 


Purchase cheapest = dbContext.Purchases 
.Where (p => p.Price == dbContext.Purchases.Min (p2 => p2.Price)) 
.FirstOrDefault(); 


In this case, you could also formulate the query without an aggregation by using an 
OrderBy followed by FirstOrDefault. 


Sum and Average 


Argument Type 


Source sequence TEnumerable<TSource> 


Result selector (optional) TSource => TResult 





Sum and Average are aggregation operators that are used in a similar manner to Min 
and Max: 


decimal[] numbers = { 3, 4, 8 }; 
decimal sumTotal = numbers.Sum(); // 15 
decimal average = numbers.Average(); // 5 (mean value) 


The following returns the total length of each of the strings in the names array: 
int combinedLength = names.Sum (s => s.Length); // 19 


Sum and Average are fairly restrictive in their typing. Their definitions are hard- 
wired to each of the numeric types (int, long, float, double, decimal, and their 
nullable versions). In contrast, Min and Max can operate directly on anything that 
implements IComparable<T>—such as a string, for instance. 


Further, Average always returns either decimal, float, or double, according to the 
following table: 


Selector type Result type 


decimal decimal 
float float 
int, long, double double 
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This means that the following does not compile (“cannot convert double to int”): 
int avg = new int[] { 3, 4 }.Average(); 

But this will compile: 
double avg = new int[] { 3, 4 }.Average(); [E35 


Average implicitly upscales the input values to avoid loss of precision. In this exam- 
ple, we averaged integers and got 3.5 without needing to resort to an input element 
cast: 


double avg = numbers.Average (n => (double) n); 


When querying a database, Sum and Average translate to the standard SQL aggrega- 
tions. The following query returns customers whose average purchase was more 
than $500: 


from c in dbContext.Customers 
where c.Purchases.Average (p => p.Price) > 500 
select c.Name; 


Aggregate 


Aggregate allows you to specify a custom accumulation algorithm for implement- 
ing unusual aggregations. Aggregate is not supported in EF Core and is somewhat 
specialized in its use cases. The following demonstrates how Aggregate can do the 
work of Sum: 


int[] numbers = { 1, 2, 3 }; 
int sum = numbers.Aggregate (0, (total, n) => total +n); // 6 


The first argument to Aggregate is the seed, from which accumulation starts. The 
second argument is an expression to update the accumulated value, given a fresh 
element. You can optionally supply a third argument to project the final result value 
from the accumulated value. 


Most problems for which Aggregate has been designed can be 
solved as easily with a foreach loop—and with more familiar 
syntax. The advantage of using Aggregate is that with large or 
complex aggregations, you can automatically parallelize the 
operation with PLINQ (see Chapter 23). 


Unseeded aggregations 


You can omit the seed value when calling Aggregate, in which case the first element 
becomes the implicit seed, and aggregation proceeds from the second element. 
Here's the preceding example, unseeded: 


int[] numbers = { 1, 2, 3 }; 
int sum = numbers.Aggregate ((total, n) => total +n); // 6 
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This gives the same result as before, but we're actually doing a different calculation. 
Before, we were calculating 0+1+2+3; now were calculating 1+2+3. We can better 
illustrate the difference by multiplying instead of adding: 


int[] numbers = { 1, 2, 3 }; 
int x = numbers.Aggregate (0, (prod, n) => prod * n);~ = // 0*1*2*3 = 0 
int y = numbers.Aggregate ( (prod, n) => prod *n); // 1%*2*3 = 6 


As you'll see in Chapter 23, unseeded aggregations have the advantage of being par- 
allelizable without requiring the use of special overloads. However, there are some 
traps with unseeded aggregations. 


Traps with unseeded aggregations 


The unseeded aggregation methods are intended for use with delegates that are 
commutative and associative. If used otherwise, the result is either unintuitive (with 
ordinary queries) or nondeterministic (in the case that you parallelize the query with 
PLINQ). For example, consider the following function: 


(total, n) => total +n *n 


This is neither commutative nor associative. (For example, 1+2*2 != 2+1%*1.) Let’s 
see what happens when we use it to sum the square of the numbers 2, 3, and 4: 


int[] numbers = { 2, 3, 4 }; 
int sum = numbers.Aggregate ((total, n) => total +n * n); // 27 


Instead of calculating 
2*2 + 3*3 + 4*4 = // 29 

it calculates: 
2 + 3*3 + 4%4 // 27 

We can fix this in a number of ways. First, we could include 0 as the first element: 
int[] numbers = { 0, 2, 3, 4 }; 


Not only is this inelegant, but it will still give incorrect results if parallelized— 
because PLINQ uses the function’s assumed associativity by selecting multiple ele- 
ments as seeds. To illustrate, if we denote our aggregation function as follows: 


f(total, n) => total +n *n 

LINQ to Objects would calculate this: 
F(F(F(O, 2),3),4) 

whereas PLINQ might do this: 
f(F(0,2),f(3,4)) 

with the following result: 


First partition: a=0+ 2*2 (= 4) 
Second partition: b = 3 + 4*4 (= 19) 
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Final result: a+ b*b (= 365) 

OR EVEN: b + a*a (= 35) 
There are two good solutions. The first is to turn this into a seeded aggregation with 
zero as the seed. The only complication is that with PLINQ, wed need to use a spe- 
cial overload in order for the query not to execute sequentially (see “Optimizing 
PLINQ” on page 934 in Chapter 23). 


The second solution is to restructure the query such that the aggregation function is 
commutative and associative: 


int sum = numbers.Select (n => n * n).Aggregate ((total, n) => total +n); 
Of course, in such simple scenarios you can (and should) use 
the Sum operator instead of Aggregate: 
int sum = numbers.Sum (n => n * n); 


You can actually go quite far just with Sum and Average. For 
instance, you can use Average to calculate a root-mean- 
square: 

Math.Sqrt (numbers.Average (n => n * n)) 


You can even calculate standard deviation: 


double mean = numbers.Average(); 
double sdev = Math.Sqrt (numbers.Average (n => 
{ 


double dif = n - mean; 
return dif * dif; 
)); 
Both are safe, efficient, and fully parallelizable. In Chapter 23, 
we give a practical example of a custom aggregation that can't 
be reduced to Sum or Average. 


Quantifiers 


TEnumerable<TSource>— bool 


Method Description SQL equivalents 
Contains Returns true if the input sequence contains the given element WHERE... IN (...) 
Any Returns true if any elements satisfy the given predicate WHERE... IN (...) 
ALL Returns true if all elements satisfy the given predicate WHERE (...) 


SequenceEqual Returns true if the second sequence has identical elements to 
the input sequence 





Contains and Any 


The Contains method accepts an argument of type TSource; Any accepts an 
optional predicate. 


Contains returns true if the given element is present: 
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bool hasAThree = new int[] { 2, 3, 4 }.Contains (3); // true; 


Any returns true if the given expression is true for at least one element. We can 
rewrite the preceding query with Any as follows: 


bool hasAThree = new int[] { 2, 3, 4 }.Any (n => n == 3); // true; 
Any can do everything that Contains can do, and more: 
bool hasABigNumber = new int[] { 2, 3, 4 }.Any (n => n> 10); // false; 


Calling Any without a predicate returns true if the sequence has one or more ele- 
ments. Here’s another way to write the preceding query: 


bool hasABigNumber = new int[] { 2, 3, 4 }.Where (n => n > 10).Any(); 


Any is particularly useful in subqueries and is used often when querying databases; 
for example: 


from c in dbContext.Customers 
where c.Purchases.Any (p => p.Price > 1000) 
select c 


All and SequenceEqual 


ALL returns true if all elements satisfy a predicate. The following returns customers 
whose purchases are less than $100: 


dbContext.Customers.Where (c => c.Purchases.ALl (p => p.Price < 100)); 


SequenceEqual compares two sequences. To return true, each sequence must have 
identical elements, in the identical order. You can optionally provide an equality 
comparer; the default is EqualityComparer<T>.Default. 


Generation Methods 


void—IEnumerable<TResult> 


Method _ Description 


Empty Creates an empty sequence 
Repeat Creates a sequence of repeating elements 


Range Creates a sequence of integers 





Empty, Repeat, and Range are static (non-extension) methods that manufacture sim- 
ple local sequences. 


Empty 
Empty manufactures an empty sequence and requires just a type argument: 


foreach (string s in Enumerable.Empty<string>()) 
Console.Write (s); // <nothing> 
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In conjunction with the ?? operator, Empty does the reverse of DefaultIfEmpty. For 
example, suppose that we have a jagged array of integers and we want to get all the 
integers into a single flat list. The following SelectMany query fails if any of the 
inner arrays is null: 


int[][] numbers = 


{ 

new int[] { 1, 2, 3 }, 

new int[] { 4, 5, 6 }, 

null // this null makes the query below fail. 
}; 


TEnumerable<int> flat = numbers.SelectMany (innerArray => innerArray); 
Empty in conjunction with ?? fixes the problem: 


TEnumerable<int> flat = numbers 
.SelectMany (innerArray => innerArray ?? Enumerable.Empty <int>()); 


foreach (int i in flat) 
Console.Write (i +" "); //123456 


Range and Repeat 
Range accepts a starting index and count (both integers): 


foreach (int i in Enumerable.Range (5, 3)) 
Console.Write (i +" "); //567 


Repeat accepts an element to repeat, and the number of repetitions: 


foreach (bool x in Enumerable.Repeat (true, 3)) 
Console.Write (x +" "); // True True True 





468 | Chapter 9: LINQ Operators 


10 


LINQ to XML 








.NET Core provides a number of APIs for working with XML data. The primary 
choice for general-purpose XML document processing is LINQ to XML. LINQ to 
XML comprises a lightweight, LINQ-friendly XML document object model, plus a 
set of supplementary query operators. 


In this chapter, we concentrate entirely on LINQ to XML. In Chapter 11, we cover 
the forward-only XML reader/writer, and in the online supplement, we cover the 
types for working with schemas and stylesheets. NET Core also includes the legacy 
XmlDocument-based DOM, which we don't cover. 


The LINQ to XML Document Object Model (DOM) is 
extremely well designed and highly performant. Even without 
LINQ, the LINQ to XML DOM is valuable as a lightweight 
facade over the low-level XmlReader and XmlWriter classes. 


All LINQ to XML types are defined in the System. Xml.Linq namespace. 


Architectural Overview 


This section starts with a very brief introduction to the concept of a DOM, and then 
explains the rationale behind LINQ to XML’s DOM. 


What Is a DOM? 
Consider the following XML file: 


<?xml version="1.0" encoding="utf-8"?> 
<customer id="123" status="archived"> 
<firstname>Joe</firstname> 
<lastname>Bloggs</lastname> 
</customer> 
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As with all XML files, we start with a declaration and then a root element, whose 
name is customer. The customer element has two attributes, each with a name (id 
and status) and value ("123" and "archived"). Within customer, there are two 
child elements, firstname and lastname, each having simple text content ("Joe" 
and "Bloggs"). 


Each of these constructs—declaration, element, attribute, value, and text content— 
can be represented with a class. And if such classes have collection properties for 
storing child content, we can assemble a tree of objects to fully describe a document. 
This is called a Document Object Model, or DOM. 


The LINQ to XML DOM 
LINQ to XML comprises two things: 


e An XML DOM, which we call the X-DOM 


¢ A set of about 10 supplementary query operators 


As you might expect, the X-DOM consists of types such as XDocument, XElement, 
and XAttribute. Interestingly, the X-DOM types are not tied to LINQ—you can 
load, instantiate, update, and save an X-DOM without ever writing a LINQ query. 


Conversely, you could use LINQ to query a DOM created of the older W3C- 
compliant types. However, this would be frustrating and limiting. The distinguish- 
ing feature of the X-DOM is that it’s LINQ-friendly, meaning: 


e It has methods that emit useful IEnumerable sequences upon which you can 
query. 

e Its constructors are designed such that you can build an X-DOM tree through a 
LINQ projection. 


X-DOM Overview 


Figure 10-1 shows the core X-DOM types. The most frequently used of these types 
is XElement. XObject is the root of the inheritance hierarchy; XElement and 
XDocument are roots of the containership hierarchy. 
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Figure 10-1. Core X-DOM types 
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Figure 10-2 shows the X-DOM tree created from the following code: 


string xml = @"<customer id='123' status='archived'> 
<firstname>Joe</firstname> 
<Lastname>Bloggs<!--nice name--></Lastname> 
</customer>"; 


XElement customer = XElement.Parse (xml); 





XElement 
Name = “customer” 


Attributes Nodes 
XElement 


Name = “firstname” 
Nodes 


XAttribute XText Value = “Joe” 


Name = "id" XElement 
Value = "123" Name = “lastname" lenumerable<XNode> 


Nodes 


XAttribute \Enumerable<XNode> XText Value = "Bloggs" 





Name = "status" 
Value = “archived” XComment 
Value = "nice name 


lEnumerable<XAttribute> lEnumerable<XNode> 














Figure 10-2. A simple X-DOM tree 
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XObject is the abstract base class for all XML content. It defines a link to the Parent 
element in the containership tree as well as an optional XDocument. 


XNode is the base class for most XML content excluding attributes. The distinguish- 
ing feature of XNode is that it can sit in an ordered collection of mixed-type XNodes. 
For instance, consider the following XML: 


<data> 
Hello world 
<subelement1/> 
<!--comment--> 
<subelement2/> 
</data> 


Within the parent element <data>, there's first an XText node (Hello world), then 
an XElement node, then an XComment node, and then a second XElement node. In 
contrast, an XAttribute will tolerate only other XAttributes as peers. 


Although an XNode can access its parent XElement, it has no concept of child nodes: 
this is the job of its subclass XContainer. XContainer defines members for dealing 
with children and is the abstract base class for XELement and XDocument. 


XElement introduces members for managing attributes—as well as a Name and 
Value. In the (fairly common) case of an element having a single XText child node, 
the Value property on XElement encapsulates this child’s content for both get and 
set operations, cutting unnecessary navigation. Thanks to Value, you can mostly 
avoid working directly with XText nodes. 


XDocument represents the root of an XML tree. More precisely, it wraps the root 
XElement, adding an XDeclaration, processing instructions, and other root-level 
“fluff? Unlike with the W3C DOM, its use is optional: you can load, manipulate, 
and save an X-DOM without ever creating an XDocument! The nonreliance on 
XDocument also means that you can efficiently and easily move a node subtree to 
another X-DOM hierarchy. 

Loading and Parsing 

Both XElement and XDocument provide static Load and Parse methods to build an 
X-DOM tree from an existing source: 


e Load builds an X-DOM from a file, URI, Stream, TextReader, or XmlReader. 


¢ Parse builds an X-DOM from a string. 
For example: 
XDocument fromWeb = XDocument.Load ("http://albahari.com/sample.xml"); 
XElement fromFile = XElement.Load (@"e:\media\somefile. xml"); 


XElement config = XElement.Parse ( 
@"<configuration> 
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<client enabled='true'> 
<timeout>30</timeout> 
</client> 
</configuration>"); 
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In later sections, we describe how to traverse and update an X-DOM. As a quick 
preview, here's how to manipulate the config element we just populated: 





foreach (XElement child in config.Elements()) 
Console.WriteLine (child.Name); // client 


XElement client = config.Element ("client"); 


bool enabled = (bool) client.Attribute ("enabled"); // Read attribute 


Console.WriteLine (enabled); // True 
client.Attribute ("enabled").SetValue (!enabled); // Update attribute 
int timeout = (int) client.Element ("timeout"); // Read element 
Console.WriteLine (timeout); // 30 


client.Element ("timeout").SetValue (timeout * 2); // Update element 
client.Add (new XElement ("retries", 3)); // Add new elememt 


Console.WriteLine (config); // Implicitly call config. ToString() 
Here's the result of that last Console. WriteLine: 


<configuration> 
<client enabled="false"> 
<timeout>60</timeout> 
<retries>3</retries> 
</client> 
</configuration> 


XNode also provides a static ReadFrom method that instantiates 
and populates any type of node from an XmlReader. Unlike 
Load, it stops after reading one (complete) node, so you can 
continue to read manually from the XmlReader afterward. 


You can also do the reverse and use an XmlReader or 
XmlWriter to read or write an XNode, via its CreateReader and 
CreateWriter methods. 


We describe XML readers and writers and how to use them 
with the X-DOM in Chapter 11. 


Saving and Serializing 


Calling ToString on any node converts its content to an XML string—formatted 
with line breaks and indentation as we just saw. (You can disable the line breaks 
and indentation by specifying SaveOptions.DisableFormatting when calling 
ToString.) 


XElement and XDocument also provide a Save method that writes an X-DOM to a 
file, Stream, TextWriter, or XmlWriter. If you specify a file, an XML declaration is 
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automatically written. There is also a WriteTo method defined in the XNode class, 
which accepts just an XmlWriter. 


We describe in more detail the handling of XML declarations when saving in 
“Documents and Declarations” on page 487. 


Instantiating an X-DOM 


Rather than using the Load or Parse methods, you can build an X-DOM tree by 
manually instantiating objects and adding them to a parent via XContainer’s Add 
method. 


To construct an XELement and XAttribute, simply provide a name and value: 


XElement LastName = new XElement ("Lastname", "Bloggs"); 
lastName.Add (new XComment ("nice name")); 


XElement customer = new XElement ("customer"); 
customer.Add (new XAttribute ("id", 123)); 
customer.Add (new XElement ("firstname", "Joe")); 
customer.Add (LastName); 


Console.WriteLine (customer.ToString()); 
Here's the result: 


<customer id="123"> 
<firstname>Joe</firstname> 
<lastname>Bloggs<!--nice name- -></Lastname> 
</customer> 


A value is optional when constructing an XElement—you can provide just the ele- 
ment name and add content later. Notice that when we did provide a value, a simple 
string sufficed—we didn't need to explicitly create and add an XText child node. The 
X-DOM does this work automatically, so you can deal simply with “values.” 


Functional Construction 


In our preceding example, it’s difficult to glean the XML structure from the code. X- 
DOM supports another mode of instantiation, called functional construction (from 
functional programming). With functional construction, you build an entire tree in 
a single expression: 


XElement customer = 
new XElement ("customer", new XAttribute ("id", 123), 
new XElement ("firstname", "joe"), 
new XElement ("Lastname", "bloggs", 
new XComment ("nice name") 
) 
ds 
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This has two benefits. First, the code resembles the shape of the XML. Second, it can 
be incorporated into the select clause of a LINQ query. For example, the following 
query projects from an EF Core entity class into an X-DOM: 
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XElement query = 
new XElement ("customers", 
from c in dbContext.Customers.AsEnumerable() 
select 
new XElement ("customer", new XAttribute ("id", c.ID), 
new XElement ("firstname", c.FirstName), 
new XElement ("Lastname", c.LastName, 
new XComment ("nice name") 
) 
) 





)s 
We examine this further in “Projecting into an X-DOM” on page 497. 


Specifying Content 
Functional construction is possible because the constructors for XElement (and 
XDocument) are overloaded to accept a params object array: 
public XElement (XName name, params object[] content) 
The same holds true for the Add method in XContainer: 
public void Add (params object[] content) 


Hence, you can specify any number of child objects of any type when building or 
appending an X-DOM. This works because anything counts as legal content. To see 
how, we need to examine how each content object is processed internally. Here are 
the decisions made by XContainer, in order: 


1. If the object is null, it’s ignored. 


2. If the object is based on XNode or XStreamingElement, it’s added as is to the 
Nodes collection. 


3. If the object is an XAttribute, it’s added to the Attributes collection. 
4. If the object is a string, it’s wrapped in an XText node and added to Nodes.! 


5. If the object implements IEnumerable, it’s enumerated, and the same rules are 
applied to each element. 


6. Otherwise, the object is converted to a string, wrapped in an XText node, and 
then added to Nodes.? 





1 The X-DOM actually optimizes this step internally by storing simple text content in a string. The 
XTEXT node is not actually created until you call Nodes(_) on the XContainer. 


2 See footnote 1. 
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Everything ends up in one of two buckets: Nodes or Attributes. Furthermore, any 
object is valid content because it can always ultimately call ToString on it and treat 
it as an XText node. 


Before calling ToString on an arbitrary type, XContainer first 
tests whether it is one of the following types: 


float, double, decimal, bool, 

DateTime, DateTimeOffset, TimeSpan 
If so, it calls an appropriate typed ToString method on the 
XmlConvert helper class instead of calling ToString on the 
object itself. This ensures that the data is round-trippable and 
compliant with standard XML formatting rules. 


Automatic Deep Cloning 


When a node or attribute is added to an element (whether via functional construc- 
tion or an Add method), the node or attribute's Parent property is set to that ele- 
ment. A node can have only one parent element: if you add an already parented 
node to a second parent, the node is automatically deep-cloned. In the following 
example, each customer has a separate copy of address: 


var address = new XElement ("address", 
new XElement ("street", "Lawley St"), 
new XElement ("town", "North Beach") 


var customer1 = new XElement ("customeri", address); 
var customer2 = new XElement ("customer2", address); 


customer1.Element ("address").Element ("street").Value = "Another St"; 
Console.WriteLine ( 
customer2.Element ("address").Element ("street").Value); // Lawley St 


This automatic duplication keeps X-DOM object instantiation free of side effects— 
another hallmark of functional programming. 


Navigating and Querying 


As you might expect, the XNode and XContainer classes define methods and proper- 
ties for traversing the X-DOM tree. Unlike a conventional DOM, however, these 
functions don't return a collection that implements IList<T>. Instead, they return 
either a single value or a sequence that implements IEnumerable<T>—upon which 
you are then expected to execute a LINQ query (or enumerate with a foreach). This 
allows for advanced queries as well as simple navigation tasks—using familiar LINQ 
query syntax. 


Element and attribute names are case sensitive in the X-DOM, 
just as they are in XML. 
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Child Node Navigation 





xz 

Return type Members Works on = © 

XNode FirstNode { get; } XContainer © 
LastNode { get; } XContainer 
IEnumerable<XNode> Nodes() XContainer* 
DescendantNodes() XContainer* 


DescendantNodesAndSelf() XElement* 


XElement Element (XName) XContainer 

IEnumerable<XElement> Elements() XContainer* 
Elements (XName) XContainer* 
Descendants() XContainer* 
Descendants (XName) XContainer* 
DescendantsAndSelf() XElLement* 


DescendantsAndSelf (XName) XElement* 
bool HasElements { get; } XELement 





Functions marked with an asterisk in the third column of this 
and other tables also operate on sequences of the same type. 
For instance, you can call Nodes on either an XContainer ora 
sequence of XContainer objects. This is possible because of 
extension methods defined in System. Xml.Linq—the supple- 
mentary query operators we talked about in the overview. 


FirstNode, LastNode, and Nodes 


FirstNode and LastNode give you direct access to the first or last child node; Nodes 
returns all children as a sequence. All three functions consider only direct 
descendants: 


var bench = new XElement ("bench", 
new XElement ("toolbox", 
new XElement ("handtool", "Hammer"), 
new XElement ("handtool", "Rasp") 
); 
new XElement ("toolbox", 
new XElement ("handtool", "Saw"), 
new XElement ("powertool", "Nailgun") 
Jia 
new XComment ("Be careful with the nailgun") 
)3 
foreach (XNode node in bench.Nodes()) 
Console.WriteLine (node.ToString (SaveOptions.DisableFormatting) + "."); 


This is the output: 





Navigating and Querying | 477 


<toolbox><handtool>Hammer</handtool><handtool>Rasp</handtool></toolbox>. 
<toolbox><handtool>Saw</handtool><power tool>Nailgun</powertool></toolbox>. 
<!--Be careful with the nailgun-->. 


Retrieving elements 


The Elements method returns just the child nodes of type XElement: 


foreach (XElement e in bench.Elements()) 
Console.WriteLine (e.Name + "=" + e.Value); // toolbox=HammerRasp 
// toolbox=SawNailgun 


The following LINQ query finds the toolbox with the nail gun: 


TEnumerable<string> query = 
from toolbox in bench.Elements() 
where toolbox.Elements().Any (tool => tool.Value == "Nailgun") 
select toolbox.Value; 


RESULT: { "SawNailgun" } 


The next example uses a SelectMany query to retrieve the hand tools in all 
toolboxes: 


IEnumerable<string> query = 
from toolbox in bench.Elements() 
from tool in toolbox.Elements() 
where tool.Name == "handtool" 
select tool.VaLlue; 


RESULT: { "Hammer", "Rasp", "Saw" } 


Elements itself is equivalent to a LINQ query on Nodes. Our 
preceding query could be started as follows: 


from toolbox in bench.Nodes().O0fType<XElement>() 
where ... 


Elements can also return just the elements of a given name: 
int x = bench.Elements ("toolbox") .Count(); // 2 
This is equivalent to the following: 
int x = bench.Elements().Where (e => e.Name == "toolbox").Count(); // 2 


Elements is also defined as an extension method accepting IEnumerable 
<XContainer> or, more precisely, it accepts an argument of this type: 


TEnumerable<T> where T : XContainer 


This allows it to work with sequences of elements, too. Using this method, we can 
rewrite the query that finds the hand tools in all toolboxes as follows: 


from tool in bench.Elements ("toolbox").Elements ("handtool") 
select tool.Value; 
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The first call to Elements binds to XContainer’s instance method; the second call to 
Elements binds to the extension method. 
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Retrieving a single element 





The method Element (singular) returns the first matching element of the given 
name. Element is useful for simple navigation, as follows: 


XElement settings = XElement.Load ("databaseSettings.xml"); 
string cx = settings.Element ("database").Element ("connectString").Value; 


Element is equivalent to calling Elements() and then applying LINQ’s First 
OrDefault query operator with a name-matching predicate. Element returns null if 
the requested element doesn’t exist. 


Element("xyz").Value will throw a NullReference 
Exception if element xyz does not exist. If youd prefer a null 
to an exception, either use the null-conditional operator— 
Element("xyz")?.Value—or cast the XElement to a string 
instead of querying its Value property. In other words: 


string xyz = (string) settings.Element ("xyz"); 


This works because XELement defines an explicit string con- 
version—just for this purpose! 


Retrieving descendants 


XContainer also provides Descendants and DescendantNodes methods that return 
child elements or nodes plus all of their children, and so on (the entire tree). 
Descendants accepts an optional element name. Returning to our earlier example, 
we can use Descendants to find all of the hand tools: 


Console.WriteLine (bench.Descendants ("handtool").Count()); // 3 
Both parent and leaf nodes are included, as the following example demonstrates: 


foreach (XNode node in bench.DescendantNodes()) 
Console.WriteLine (node.ToString (SaveOptions.DisableFormatting) ) ; 


Here's the output: 


<toolbox><handtool>Hammer</handtool><handtool>Rasp</handtool></toolbox> 
<handtool>Hammer</handtool> 

Hammer 

<handtool>Rasp</handtool> 

Rasp 

<toolbox><handtool>Saw</handtool><power tool>Nailgun</powertool></toolbox> 
<handtool>Saw</handtool> 

Saw 

<power tool>Nailgun</powertool> 

Nailgun 

<!--Be careful with the nailgun--> 
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The next query extracts all comments anywhere within the X-DOM that contain the 
word careful: 


TEnumerable<string> query = 
from c in bench.DescendantNodes().OfType<XComment>() 
where c.Value.Contains ("careful") 
orderby c.Value 
select c.Value; 


Parent Navigation 


All XNodes have a Parent property and Ancestor XXX methods for parent navigation. 
A parent is always an XElement: 


Return type Members Works on 

XELement Parent { get; } XNode 

Enumerable<XElement> Ancestors() XNode 
Ancestors (XName) XNode 
AncestorsAndSelf() XELement 


AncestorsAndSelf (XName) XELement 





If x is an XElement, the following always prints true: 


foreach (XNode child in x.Nodes()) 
Console.WriteLine (child.Parent == x); 


However, the same is not the case if x is an XDocument. XDocument is peculiar: it can 


have children but can never be anyone’s parent! To access the XDocument, you 
instead use the Document property; this works on any object in the X-DOM tree. 


Ancestors returns a sequence whose first element is Parent, and whose next ele- 
ment is Parent.Parent, and so on, until the root element. 


You can navigate to the root element with the LINQ query 
AncestorsAndSelf().Last(). 


Another way to achieve the same thing is to call 
DocumentRoot, although this works only if an XDocument is 


present. 
Peer Node Navigation 
Return type Members Defined in 
bool IsBefore (XNode node) XNode 
IsAfter (XNode node) XNode 
XNode PreviousNode { get; } XNode 
NextNode { get; } XNode 
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Return type Members Defined in 


TEnumerable<XNode> NodesBeforeSeLf() XNode x = 
K< 
NodesAfterSelf() XNode i 2 
fo) 
IEnumerable<XELlement> ELementsBeforeSelf() XNode 





ElementsBeforeSelf (XName name) XNode 
ElementsAfterSelf() XNode 
ElementsAfterSelf (XName name) XNode 





With PreviousNode and NextNode (and FirstNode/LastNode), you can traverse 
nodes with the feel of a linked list. This is noncoincidental: internally, nodes are 
stored in a linked list. 


XNode internally uses a singly linked list, so PreviousNode is 
not performant. 


Attribute Navigation 

Return type Members Defined in 

bool HasAttributes { get; } XElement 

XAttribute Attribute (XName name) XELement 
FirstAttribute { get; } XElement 
LastAttribute { get; } XElement 

ITEnumerable<xXAttribute> Attributes() XElement 


Attributes (XName name) XElement 





In addition, XAttribute defines PreviousAttribute and NextAttribute properties 
as well as Parent. 


The Attributes method that accepts a name returns a sequence with either zero or 
one element; an element cannot have duplicate attribute names in XML. 


Updating an X-DOM 
You can update elements and attributes in the following ways: 


¢ Call SetValue or reassign the Value property 
e Call SetELementValue or SetAttributeValue 
e Call one of the RemoveXxXX methods 


e Call one of the AddXxX or ReplaceXXxX methods, specifying fresh content 
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You can also reassign the Name property on XElement objects. 


Simple Value Updates 


Members Works on 


SetValue (object value) XElement, XAttribute 
Value { get; set } XELement, XAttribute 





The SetValue method replaces an element or attribute's content with a simple value. 
Setting the Value property does the same but accepts string data only. We describe 
both of these functions in detail later in “Working with Values” on page 484. 


An effect of calling SetValue (or reassigning Value) is that it replaces all child 
nodes: 


XElement settings = new XElement ("settings", 
new XElement ("timeout", 30) 
); 
settings.SetValue ("blah"); 
Console.WriteLine (settings.ToString()); // <settings>blah</settings> 


Updating Child Nodes and Attributes 


Add Add (params object[] content) XContainer 
AddFirst (params object[] content) XContainer 

Remove RemoveNodes() XContainer 
RemoveAttributes() XElement 
RemoveALl() XElement 

Update ReplaceNodes (params object[] content) XContainer 


ReplaceAttributes (params object[] content) XElement 
ReplaceALl (params object[] content XELement 
SetELementValue (XName name, object value) XELement 


SetAttributeValue (XName name, object value) XElement 





The most convenient methods in this group are the last two: SetELementValue and 
SetAttributeValue. They serve as shortcuts for instantiating an XElement or 
XAttribute and then Adding it to a parent, replacing any existing element or 
attribute of that name: 


XElement settings = new XElement ("settings"); 
settings.SetElementValue ("timeout", 30); // Adds child node 
settings.SetElementValue ("timeout", 60); // Update it to 60 





482 | Chapter 10: LINQ to XML 


Add appends a child node to an element or document. AddFirst does the same thing 
but inserts at the beginning of the collection rather than the end. 
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You can remove all child nodes or attributes in one hit with RemoveNodes or 
RemoveAttributes. RemoveALl is equivalent to calling both methods. 





The ReplacexXXX methods are equivalent to Removing and then Adding. They take a 
snapshot of the input, so e. RepLaceNodes(e.Nodes()) works as expected. 


Updating Through the Parent 


Members Works on 


AddBeforeSelf (params object[] content) XNode 
AddAfterSelf (params object[] content) XNode 
Remove() XNode, XAttribute 
ReplaceWith (params object[] content) XNode 





The methods AddBeforeSelf, AddAfterSelf, Remove, and ReplaceWith don't oper- 
ate on the node’s children. Instead, they operate on the collection in which the node 
itself is in. This requires that the node have a parent element—otherwise, an excep- 
tion is thrown. AddBeforeSelf and AddAfterSelf are useful for inserting a node 
into an arbitrary position: 


XElement items = new XElement ("items", 
new XElement ("one"), 
new XElement ("three") 
); 
items.FirstNode.AddAfterSelf (new XElement ("two")); 


Here’s the result: 
<items><one /><two /><three /></items> 


Inserting into an arbitrary position within a long sequence of elements is efficient 
because nodes are stored internally in a linked list. 


The Remove method removes the current node from its parent. ReplaceWith does 
the same and then inserts some other content at the same position: 


XElement items = XElement.Parse ("<items><one/><two/><three/></items>"); 
items.FirstNode.ReplaceWith (new XComment ("One was here")); 


Here’s the result: 


<items><!--one was here--><two /><three /></items> 


Removing a sequence of nodes or attributes 


Thanks to extension methods in System.Xml.Linq, you can also call Remove on a 
sequence of nodes or attributes. Consider this X-DOM: 
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XElement contacts = XElement.Parse ( 

@"<contacts> 
<customer name='Mary'/> 
<customer name='Chris' archived='true'/> 
<supplier name='Susan'> 

<phone archived='true'>012345678<! --confidential - -></phone> 
</supplier> 
</contacts>"); 


The following removes all customers: 
contacts.Elements ("customer") .Remove(); 
The following removes all archived contacts (so Chris disappears): 


contacts.Elements().Where (e => (bool?) e.Attribute ("archived") == true) 
.Remove(); 


If we replaced Elements() with Descendants(), all archived elements throughout 
the DOM would disappear, yielding this result: 


<contacts> 
<customer name="Mary" /> 
<supplier name="Susan" /> 
</contacts> 


The next example removes all contacts that feature the comment “confidential” any- 
where in their tree: 


contacts.Elements().Where (e => e.DescendantNodes() 
.Of Type<XComment>() 
-Any (c => c.Value == "confidential") 
).Remove(); 


This is the result: 


<contacts> 

<customer name="Mary" /> 

<customer name="Chris" archived="true" /> 
</contacts> 


Contrast this with the following simpler query, which strips all comment nodes 
from the tree: 


contacts .DescendantNodes().OfType<XComment>() .Remove(); 


Internally, the Remove method first reads all matching ele- 
ments into a temporary list and then enumerates over the 
temporary list to perform the deletions. This avoids errors 
that could otherwise result from deleting and querying at the 
same time. 


Working with Values 


XElement and XAttribute both have a Value property of type string. If an element 
has a single XText child node, XElement’s Value property acts as a convenient 
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shortcut to the content of that node. With XAttribute, the Value property is simply 
the attribute's value. 


Despite the storage differences, the X-DOM provides a consistent set of operations 
for working with element and attribute values. 


Setting Values 


There are two ways to assign a value: call SetValue or assign the Value property. 
SetValue is more flexible because it accepts not just strings, but other simple data 
types, too: 

var e = new XElement ("date", DateTime.Now); 


e.SetValue (DateTime.Now.AddDays(1)); 
Console.Write (e.Value); // 2019-10-02T16:39:10.734375+09:00 


We could have instead just set the element’s Value property, but this would mean 
manually converting the DateTime to a string. This is more complicated than calling 
ToString—it requires the use of XmLConvert for an XML-compliant result. 


When you pass a value into XElement or XAttribute’s constructor, the same auto- 
matic conversion takes place for nonstring types. This ensures that DateTimes are 
correctly formatted; true is written in lowercase, and double.NegativelInfinity is 
written as “-INF”. 


Getting Values 


To go the other way around and parse a Value back to a base type, you simply cast 
the XElement or XAttribute to the desired type. It sounds like it shouldn't work— 
but it does! For instance: 


XElement e = new XElement ("now", DateTime.Now); 
DateTime dt = (DateTime) e; 


XAttribute a = new XAttribute ("resolution", 1.234); 
double res = (double) a; 


An element or attribute doesn’t store DateTimes or numbers natively—they’re 
always stored as text and then parsed as needed. It also doesn't “remember” the orig- 
inal type, so you must cast it correctly to avoid a runtime error. To make your code 
robust, you can put the cast in a try/catch block, catching a FormatException. 


Explicit casts on XELement and XAttribute can parse to the following types: 
¢ All standard numeric types 


e string, bool, DateTime, DateTimeOffset, TimeSpan, and Guid 


e Nullable<> versions of the aforementioned value types 


Casting to a nullable type is useful in conjunction with the Element and Attribute 
methods, because if the requested name doesn't exist, the cast still works. For 
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instance, if x has no timeout element, the first line generates a runtime error and 
the second line does not: 


int timeout = (int) x.Element ("timeout"); // Error 
int? timeout = (int?) x.Element ("timeout"); // OK; timeout is null. 


You can factor away the nullable type in the final result with the ?? operator. The 
following evaluates to 1.0 if the resolution attribute doesn’t exist: 


double resolution = (double?) x.Attribute ("resolution") ?? 1.0; 


Casting to a nullable type won't get you out of trouble, though, if the element or 
attribute exists and has an empty (or improperly formatted) value. For this, you 
must catch a FormatException. 


You can also use casts in LINQ queries. The following returns “John”: 


var data = XElement.Parse ( 
@"<data> 
<customer id='1' name='Mary' credit='100' /> 
<customer id='2' name='John' credit='150' /> 
<customer id='3' name='Anne' /> 
</data>"); 


TEnumerable<string> query = from cust in data.Elements() 
where (int?) cust.Attribute ("credit") > 100 
select cust.Attribute ("name").Value; 


Casting to a nullable int avoids a NullLReferenceException in the case of Anne, 
who has no credit attribute. Another solution would be to add a predicate to the 
where clause: 


where cust.Attributes ("credit").Any() && (int) cust.Attribute... 


The same principles apply in querying element values. 


Values and Mixed Content Nodes 


Given the value of Value, you might wonder when youd ever need to deal directly 
with XText nodes. The answer is when you have mixed content. For example: 


<summary>An XAttribute is <bold>not</bold> an XNode</summary> 


A simple Value property is not enough to capture summary’s content. The summary 
element contains three children: an XText node followed by an XElement, followed 
by another XText node. Here’s how to construct it: 


XElement summary = new XElement ("summary", 
new XText ("An XAttribute is "), 
new XElement ("bold", "not"), 
new XText (" an XNode") 
)3 


Interestingly, we can still query summary’s Value—without getting an exception. 
Instead, we get a concatenation of each child’s value: 
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An XAttribute is not an XNode 


It's also legal to reassign summary’s Value, at the cost of replacing all previous chil- 
dren with a single new XText node. 
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Automatic XText Concatenation 


When you add simple content to an XElement, the X-DOM appends to the existing 
XText child rather than creating a new one. In the following examples, e1 and e2 
end up with just one child XText element whose value is HelloWor ld: 


var e1 = new XElement ("test", "Hello"); e1.Add ("World"); 
var e2 = new XElement ("test", "Hello", "World"); 


If you specifically create XText nodes, however, you end up with multiple children: 


var e = new XElement ("test", new XText ("Hello"), new XText ("World")); 
Console.WriteLine (e.Value); // HelloWorld 
Console.WriteLine (e.Nodes().Count()); // 2 


XElement doesn’t concatenate the two XText nodes, so the nodes’ object identities 
are preserved. 


Documents and Declarations 


XDocument 


As we said previously, an XDocument wraps a root XElement and allows you to add 
an XDeclaration, processing instructions, a document type, and root-level com- 
ments. An XDocument is optional and can be ignored or omitted: unlike with the 
W3C DOM, it does not serve as glue to keep everything together. 


An XDocument provides the same functional constructors as XELement. And because 
it’s based on XContainer, it also supports the AddXXX, RemoveXXxX, and ReplaceXXxXx 
methods. Unlike XELement, however, an XDocument can accept only limited content: 
e A single XElement object (the root) 
e A single XDeclaration object 


e A single XDocumentType object (to reference a document type definition 
[DTD]) 


e Any number of XProcessingInstruction objects 


e Any number of XComment objects 


Of these, only the root XElement is mandatory in order to 
have a valid XDocument. The XDeclaration is optional—if 
omitted, default settings are applied during serialization. 
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The simplest valid XDocument has just a root element: 


var doc = new XDocument ( 
new XElement ("test", "data") 
); 


Notice that we didn’t include an XDeclaration object. The file generated by calling 
doc.Save would still contain an XML declaration, however, because one is gener- 
ated by default. 


The next example produces a simple but correct XHTML file, illustrating all the 
constructs that an XDocument can accept: 


var styleInstruction = new XProcessingInstruction ( 
"xml-stylesheet", "href='styles.css' type='text/css'"); 


var docType = new XDocumentType ("html", 
".//W3C//DTD XHTML 1.0 Strict//EN", 
"http: //www.w3.org/TR/xhtml1/DTD/xhtml1-strict.dtd", null); 


XNamespace ns = "http://www.w3.o0rg/1999/xhtmL"; 
var root = 
new XElement (ns + "html", 
new XElement (ns + "head", 
new XElement (ns + "title", "An XHTML page")), 
new XElement (ns + "body", 
new XElement (ns + "p", "This is the content")) 


); 


var doc = 
new XDocument ( 
new XDeclaration ("1.0", "utf-8", "no"), 
new XComment ("Reference a stylesheet"), 
styleInstruction, 
docType, 
root); 


doc.Save ("test.html"); 


The resultant test.html reads as follows: 


<?xml version="1.0" encoding="utf-8" standalone="no"?> 
<!--Reference a stylesheet- -> 
<?xml-stylesheet href='styles.css' type='text/css'?> 
<!DOCTYPE html PUBLIC "-//W3C//DTD XHTML 1.0 Strict//EN" 
"http: //www.w3.org/TR/xhtml1/DTD/xhtml1-strict.dtd"> 
<html xmlins="http: //www.w3.org/1999/xhtmL"> 
<head> 
<title>An XHTML page</title> 
</head> 
<body> 
<p>This is the content</p> 
</body> 
</html> 
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XDocument has a Root property that serves as a shortcut for accessing a document's 





single XElement. The reverse link is provided by XObject’s Document property, c 
: ; : 4 
which works for all objects in the tree: KW) 
rr eo 
Console.WriteLine (doc.Root.Name.LocaLlName) ; // html to) 
XElement bodyNode = doc.Root.Element (ns + "body"); 
Console.WriteLine (bodyNode.Document == doc); // True 


Recall that a document’s children have no Parent: 


Console.WriteLine (doc.Root.Parent == null); // True 
foreach (XNode node in doc.Nodes()) 
Console.Write (node.Parent == null); // TrueTrueTrueTrue 


An XDeclaration is not an XNode and does not appear in the 
document’s Nodes collection—unlike comments, processing 
instructions, and the root element. Instead, it’s assigned to a 
dedicated property called Declaration. This is why “True” is 
repeated four and not five times in the last example. 


XML Declarations 
A standard XML file starts with a declaration such as the following: 
<?xml version="1.0" encoding="utf-8" standalone="yes"?> 


An XML declaration ensures that the file will be correctly parsed and understood by 
a reader. XELement and XDocument follow these rules in emitting XML declarations: 


¢ Calling Save with a filename always writes a declaration. 


¢ Calling Save with an XmlWriter writes a declaration unless the XmlWriter is 
instructed otherwise. 


e The ToString method never emits an XML declaration. 


You can instruct an XmlWriter not to produce a declaration 
by setting the OmitXmlDeclaration and ConformanceLevel 
properties of an XmlWriterSettings object when constructing 
the XmlWriter. We describe this in Chapter 11. 


The presence or absence of an XDeclaration object has no effect on whether an 
XML declaration is written. The purpose of an XDeclaration is instead to hint the 
XML serialization, in two ways: 


e What text encoding to use 


e What to put in the XML declaration’s encoding and standalone attributes 
(should a declaration be written) 
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XDeclaration’s constructor accepts three arguments, which correspond to the 
attributes version, encoding, and standalone. In the following example, test.xml is 
encoded in UTF-16: 


var doc = new XDocument ( 
new XDeclaration ("1.0", "utf-16", "yes"), 
new XElement ("test", "data") 
)3 
doc.Save ("test.xml"); 
Whatever you specify for the XML version is ignored by the 
XML writer: it always writes "1.0". 


The encoding must use an IETF code such as "utf-16", just as it would appear in 
the XML declaration. 


Writing a declaration to a string 


Suppose that we want to serialize an XDocument to a string, including the XML 
declaration. Because ToString doesn't write a declaration, wed need to use an 
XmlWriter, instead: 


var doc = new XDocument ( 

new XDeclaration ("1.0", "utf-8", "yes"), 

new XElement ("test", "data") 

); 
var output = new StringBuilder(); 
var settings = new XmlWriterSettings { Indent = true }; 
using (XmlWriter xw = XmlWriter.Create (output, settings)) 
doc.Save (xw); 

Console.WriteLine (output. ToString()); 


This is the result: 


<?xml version="1.0" encoding="utf-16" standalone="yes"?> 
<test>data</test> 


Notice that we have UTF-16 in the output, even though we explicitly requested 
UTF-8 in an XDeclaration! This might look like a bug, but in fact, XmlWriter is 
being remarkably smart. Because we're writing to a string and not a file or stream, 
it’s impossible to apply any encoding other than UTF-16—the format in which 
strings are internally stored. Hence, XmlWriter writes "utf-16", so as not to lie. 


This also explains why the ToString method doesn't emit an XML declaration. 
Imagine that instead of calling Save, you did the following to write an XDocument to 
a file: 


File.WriteAllText ("data.xml", doc.ToString()); 


As it stands, data.xml would lack an XML declaration, making it incomplete but still 
parsable (you can infer the text encoding). But if ToString() emitted an XML dec- 
laration, data.xml would actually contain an incorrect declaration (encoding= 
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“utf-16"), which might prevent it from being read at all, because WriteAllText 
encodes using UTF-8. 


Names and Namespaces 


Just as .NET types can have namespaces, so too can XML elements and attributes. 


XML namespaces achieve two things. First, rather like namespaces in C#, they help 
avoid naming collisions. This can become an issue when you merge data from one 
XML file into another. Second, namespaces assign absolute meaning to a name. The 
name “nil,” for instance, could mean anything. Within the http://www.w3.org/2001/ 
xmlschema-instancenamespace, however, “nil” means something equivalent to null 
in C# and comes with specific rules on how it can be applied. 


Because XML namespaces are a significant source of confusion, we first cover 
namespaces in general, and then move on to how they're used in LINQ to XML. 


Namespaces in XML 


Suppose that we want to define a customer element in the namespace 
OReilly.Nutshell.CSharp. There are two ways to proceed. The first is to use the 
xmLns attribute: 


<customer xmlns="OReilly.Nutshell.CSharp"/> 


xmlns is a special reserved attribute. When used in this manner, it performs two 
functions: 


¢ It specifies a namespace for the element in question. 


¢ It specifies a default namespace for all descendant elements. 


This means that in the following example, address and postcode implicitly reside 
in the OReilly.Nutshell. CSharp namespace: 


<customer xmlns="OReilly.Nutshell.CSharp"> 
<address> 
<postcode>02138</postcode> 
</address> 
</customer> 


If we want address and postcode to have no namespace, wed need to do this: 


<customer xmlns="OReilly.Nutshell.CSharp"> 
<address xmlins=""> 
<postcode>02138</postcode> <!-- postcode now inherits empty ns --> 
</address> 
</customer> 
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Prefixes 


The other way to specify a namespace is with a prefix. A prefix is an alias that you 
assign to a namespace to save typing. There are two steps in using a prefix—defining 
the prefix and using it. You can do both together: 


<nut:customer xmlns:nut="OReilly.Nutshell.CSharp"/> 


Two distinct things are happening here. On the right, xmlns:nut="..." defines a 
prefix called nut and makes it available to this element and all its descendants. On 
the left, nut: customer assigns the newly allocated prefix to the customer element. 


A prefixed element does not define a default namespace for descendants. In the fol- 
lowing XML, firstname has an empty namespace: 


<nut:customer xmlns:nut="OReilly.Nutshell.CSharp"> 
<firstname>Joe</firstname> 
</customer> 


To give firstname the OReilly.Nutshell.CSharp prefix, you must do this: 


<nut:customer xmlns:nut="OReilly.Nutshell.CSharp"> 
<nut: firstname>Joe</firstname> 
</customer> 


You can also define a prefix—or prefixes—for the convenience of your descendants, 
without assigning any of them to the parent element itself. The following defines 
two prefixes, i and z, while leaving the customer element itself with an empty 
namespace: 


<customer xmlns:i="http: //www.w3.org/2001/XMLSchema-instance" 
xmlns:z="http://schemas.microsoft.com/2003/10/Serialization/"> 


</customer> 


If this were the root node, the whole document would have i and z at its fingertips. 
Prefixes are convenient when elements need to draw from multiple namespaces. 


Notice that both namespaces in this example are URIs. Using URIs (that you own) is 
standard practice: it ensures namespace uniqueness. So, in real life, our customer 
element would more likely be 


<customer xmlns="http://oreilly.com/schemas/nutshell/csharp"/> 


or: 


<nut:customer xmlns:nut="http://oreilly.com/schemas/nutshell/csharp"/> 


Attributes 


You can assign namespaces to attributes, too. The main difference is that it always 
requires a prefix; for instance: 


<customer xmlns:nut="OReilly.Nutshell.CSharp" nut:id="123" /> 
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Another difference is that an unqualified attribute always has an empty namespace: 
it never inherits a default namespace from a parent element. 


Attributes tend not to need namespaces because their meaning is usually local to the 
element. An exception is with general-purpose or metadata attributes such as the 
nil attribute defined by W3C: 


<customer xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance"> 
<firstname>Joe</firstname> 
<lastname xsi:nil="true"/> 
</customer> 
This indicates unambiguously that Lastname is nil (null in C#) and not an empty 
string. Because we've used the standard namespace, a general-purpose parsing util- 
ity could know with certainty our intention. 


Specifying Namespaces in the X-DOM 


So far in this chapter, we've used just simple strings for XElement and XAttribute 
names. A simple string corresponds to an XML name with an empty namespace— 
rather like a .NET type defined in the global namespace. 


There are a couple of ways to specify an XML namespace. The first is to enclose it in 
braces, before the local name: 


var e = new XElement ("{http://domain.com/xmlspace}customer", "Bloggs"); 
Console.WriteLine (e.ToString()); 


This yields the resulting XML: 
<customer xmlns="http://domain.com/xmlspace">Bloggs</customer> 


The second (and more performant) approach is to use the XNamespace and XName 
types. Here are their definitions: 


public sealed class XNamespace 


{ 
public string NamespaceName { get; } 
} 
public sealed class XName // A local name with optional namespace 
{ 


public string LocalName { get; } 
public XNamespace Namespace { get; } // Optional 


} 


Both types define implicit casts from string, so the following is legal: 


XNamespace ns) = "http://domain.com/xmlspace"; 
XName LocalName = "customer"; 
XName fullName "{http: //domain.com/xmlspace}customer"; 


XNamespace also overloads the + operator, allowing you to combine a namespace 
and name into an XName without using braces: 
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XNamespace ns = "http://domain.com/xmlspace"; 
XName fullName = ns + "customer"; 
Console.WriteLine (fullName) ; // {http://domain.com/xmlspace}customer 


All constructors and methods in the X-DOM that accept an element or attribute 
name actually accept an XName object rather than a string. The reason you can sub- 
stitute a string—as in all our examples to date—is because of the implicit cast. 


Specifying a namespace is the same whether for an element or an attribute: 


XNamespace ns = "http://domain.com/xmlspace"; 
var data = new XElement (ns + "data", 
new XAttribute (ns + "id", 123) 
)3 


The X-DOM and Default Namespaces 


The X-DOM ignores the concept of default namespaces until it comes time to 
actually output XML. This means that when you construct a child XElement, you 
must give it a namespace explicitly if needed: it will not inherit from the parent: 


XNamespace ns = "http://domain.com/xmlspace"; 

var data = new XElement (ns + "data", 
new XElement (ns + "customer", "Bloggs"), 
new XElement (ns + "purchase", "Bicycle") 


); 


The X-DOM does, however, apply default namespaces when reading and outputting 
XML: 


Console.WriteLine (data.ToString()); 


OUTPUT: 
<data xmlns="http: //domain.com/xmlspace"> 
<customer>Bloggs</customer> 
<purchase>Bicycle</purchase> 
</data> 


Console.WriteLine (data.Element (ns + "customer").ToString()); 


OUTPUT: 
<customer xmlns="http: //domain.com/xmlspace">Bloggs</customer> 


If you construct XElement children without specifying namespaces; in other words: 


XNamespace ns = “http://domain.com/xmlspace"; 
var data = new XElement (ns + "data", 
new XElement ("customer", "Bloggs"), 
new XElement ("purchase", "Bicycle") 
); 
Console.WriteLine (data.ToString()); 


you get this result, instead: 


<data xmlns="http://domain.com/xmlspace"> 
<customer xmlns="">Bloggs</customer> 





494 | Chapter 10: LINQ to XML 


<purchase xmlns="">Bicycle</purchase> 


</data> 
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Another trap is failing to include a namespace when navigating an X-DOM: 


XNamespace ns = "http://domain.com/xmlspace"; 

var data = new XElement (ns + "data", 
new XElement (ns + "customer", "Bloggs"), 
new XElement (ns + "purchase", "Bicycle") 





> 
XElement x = data.Element (ns + "customer"); // ok 
XElement y = data.Element ("customer"); // null 


If you build an X-DOM tree without specifying namespaces, you can subsequently 
assign every element to a single namespace, as follows: 


foreach (XElement e in data.DescendantsAndSelf()) 
if (e.Name.Namespace == "") 
e.Name = ns + e.Name.LocalName; 


Prefixes 


The X-DOM treats prefixes just as it treats namespaces: purely as a serialization 
function. This means that you can choose to completely ignore the issue of prefixes 
—and get by! The only reason you might want to do otherwise is for efficiency 
when outputting to an XML file. For example, consider this: 


XNamespace ns1 = "http://domain.com/space1"; 
XNamespace ns2 = "http://domain.com/space2"; 


var mix = new XElement (ns1 + "data", 
new XElement (ns2 + "element", "value"), 
new XElement (ns2 + "element", "value"), 
new XElement (ns2 + "element", "vaLlue") 


)s 
By default, XELement will serialize this, as follows: 


<data xmlns="http://domain.com/space1"> 
<element xmins="http://domain.com/space2">value</element> 
<element xmlins="http://domain.com/space2">value</element> 
<element xmlins="http://domain.com/space2">value</element> 
</data> 


As you can see, there’s a bit of unnecessary duplication. The solution is not to 
change the way you construct the X-DOM, but instead to hint the serializer prior to 
writing the XML. Do this by adding attributes defining prefixes that you want to see 
applied. This is typically done on the root element: 


mix.SetAttributeValue (XNamespace.Xmlins + "nsi", ns1); 
mix.SetAttributeValue (XNamespace.Xmlns + "ns2", ns2); 


This assigns the prefix “ns1” to our XNamespace variable ns1, and “ns2” to ns2. The 
X-DOM automatically picks up these attributes when serializing and uses them to 
condense the resulting XML. Here’s the result now of calling ToString on mix: 
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<nsi:data xmlns:nsi="http://domain.com/space1" 
xmlns:ns2="http://domain.com/space2"> 
<ns2:element>value</ns2:element> 
<ns2:element>value</ns2:element> 
<ns2:element>value</ns2:element> 
</nsi:data> 


Prefixes don’t change the way you construct, query, or update the X-DOM—for 
these activities, you ignore the presence of prefixes and continue to use full names. 
Prefixes come into play only when converting to and from XML files or streams. 


Prefixes are also honored in serializing attributes. In the following example, we 
record a customer's date of birth and credit as "nil" using the W3C-standard 
attribute. The highlighted line ensures that the prefix is serialized without unneces- 
sary namespace repetition: 


XNamespace xsi = "http://www.w3.org/2001/XMLSchema-instance"; 
var nil = new XAttribute (xsi + "nil", true); 


var cust = new XElement ("customers", 

new XAttribute (XNamespace.Xmlns + "xsi", xsi), 

new XElement ("customer", 
new XElement ("Lastname", "Bloggs"), 
new XElement ("dob", nil), 
new XElement ("credit", nil) 

) 

); 


This is its XML: 


<customers xmlns:xst="http: //www.w3.org/2001/XMLSchema-instance"> 
<customer> 
<Lastname>Bloggs</lastname> 
<dob xsi:nil="true" /> 
<credit xsi:nil="true" /> 
</customer> 
</customers> 


For brevity, we predeclared the nil XAttribute so that we could use it twice in 
building the DOM. You're allowed to reference the same attribute twice because it’s 
automatically duplicated as required. 


Annotations 


You can attach custom data to any XObject with an annotation. Annotations are 
intended for your own private use and are treated as black boxes by X-DOM. If 
you've ever used the Tag property on a Windows Forms or WPF control, you'll be 
familiar with the concept—the difference is that you have multiple annotations, and 
your annotations can be privately scoped. You can create an annotation that other 
types cannot even see—let alone overwrite. 


The following methods on XObject add and remove annotations: 
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public void AddAnnotation (object annotation) 
public void RemoveAnnotations<T>() where T : class 


The following methods retrieve annotations: 


public T Annotation<T>() where T : class 
public IEnumerable<T> Annotations<T>() where T : class 


Each annotation is keyed by its type, which must be a reference type. The following 
adds and then retrieves a string annotation: 


XElement e = new XElement ("test"); 
e.AddAnnotation ("Hello"); 
Console.WriteLine (e.Annotation<string>()); // Hello 


You can add multiple annotations of the same type, and then use the Annotations 
method to retrieve a sequence of matches. 


A public type such as string doesn’t make a great key, however, because code in 
other types can interfere with your annotations. A better approach is to use an inter- 
nal or (nested) private class: 


class X 


{ 


class CustomData { internal string Message; } // Private nested type 


static void Test() 


{ 
XElement e = new XElement ("test"); 


e.AddAnnotation (new CustomData { Message = "Hello" } ); 
Console.Write (e.Annotations<CustomData>().First().Message); // Hello 


} 
} 


To remove annotations, you must also have access to the key’s type: 


e.RemoveAnnotations<CustomData>(); 


Projecting into an X-DOM 


So far, we've shown how to use LINQ to get data out of an X-DOM. You can also use 
LINQ queries to project into an X-DOM. The source can be anything over which 
LINQ can query, such as the following: 


¢ EF Core entity classes 
¢ A local collection 
e Another X-DOM 
Regardless of the source, the strategy is the same in using LINQ to emit an X-DOM: 


first write a functional construction expression that produces the desired X-DOM 
shape and then build a LINQ query around the expression. 
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For instance, suppose that we want to retrieve customers from a database into the 
following XML: 


<customers> 
<customer id="1"> 
<name>Sue</name> 
<buys>3</buys> 
</customer> 
</customers> 


We start by writing a functional construction expression for the X-DOM using sim- 
ple literals: 


var customers = 
new XElement ("customers", 
new XElement ("customer", new XAttribute ("id", 1), 
new XElement ("name", "Sue"), 
new XElement ("buys", 3) 
) 
); 


We then turn this into a projection and build a LINQ query around it: 


var customers = 
new XElement ("customers", 
// We must call AsEnumerable() due to a bug in EF Core. 
from c in dbContext.Customers.AsEnumerable() 
select 
new XElement ("customer", new XAttribute ("id", c.ID), 
new XElement ("name", c.Name), 
new XElement ("buys", c.Purchases.Count) 
) 
); 


The call to AsEnumerable is required due to a bug in EF Core 
(a fix is scheduled for a later release). After the bug is fixed, 
removing the call to AsEnumerable will improve efficiency by 
avoiding a round trip with each call to c. Purchases. Count. 


Here’s the result: 


<customers> 
<customer id="1"> 
<name>Tom</name> 
<buys>3</buys> 
</customer> 
<customer id="2"> 
<name>Harry</name> 
<buys>2</buys> 
</customer> 
</customers> 


We can see how this works more clearly by constructing the same query in two 
steps. 
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First: 


TEnumerable<XElement> sqlQuery = 
from c in dbContext.Customers.AsEnumerable() 
select 
new XElement ("customer", new XAttribute ("id", c.ID), 
new XElement ("name", c.Name), 
new XElement ("buys", c.Purchases.Count) 


); 


This inner portion is a normal LINQ query that projects into XELements. Here's the 
second step: 


= 
z Zz 
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var customers = new XElement ("customers", sqlQuery); 


This constructs the root XElement. The only thing unusual is that the content, 
sqlQuery, is not a single XElement but an IQueryable<XElement>, which imple- 
ments IEnumerable<XElement>. Remember that in the processing of XML content, 
collections are automatically enumerated. So, each XElement is added as a child 
node. 


Eliminating Empty Elements 


Suppose in the preceding example that we also wanted to include details of the cus- 
tomer’s most recent high-value purchase. We could do this as follows: 


var customers = 
new XElement ("customers", 
// The AsEnumerable call can be removed when the EF Core bug is fixed. 
from c in dbContext.Customers.AsEnumerable() 
let lastBigBuy = (from p in c.Purchases 
where p.Price > 1000 
orderby p.Date descending 
select p).FirstOrDefault() 
select 
new XElement ("customer", new XAttribute ("id", c.ID), 
new XElement ("name", c.Name), 
new XElement ("buys", c.Purchases.Count), 
new XElement ("LastBigBuy", 
new XElement ("description", LastBigBuy?.Description), 
new XElement ("price", LastBigBuy?.Price ?? Om) 
) 
) 
)3 


This emits empty elements, though, for customers with no high-value purchases. (If 
it were a local query rather than a database query, it would throw a Null 
ReferenceException.) In such cases, it would be better to omit the lastBigBuy 
node entirely. We can achieve this by wrapping the constructor for the LlastBigBuy 
element in a conditional operator: 


select 
new XElement ("customer", new XAttribute ("id", c.ID), 
new XElement ("name", c.Name), 
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new XElement ("buys", c.Purchases.Count), 
LlastBigBuy == null ? null : 
new XElement ("LastBigBuy", 
new XElement ("description", LastBigBuy.Description), 
new XElement ("price", LastBigBuy.Price) 


For customers with no lastBigBuy, a null is emitted instead of an empty XElement. 
This is what we want, because null content is simply ignored. 


Streaming a Projection 


If you're projecting into an X-DOM only to Save it (or call ToString on it), you can 
improve memory efficiency through an XStreamingElement. An XStreaming 
Element is a cut-down version of XElement that applies deferred loading semantics 
to its child content. To use it, you simply replace the outer XElements with 
XStreamingELlements: 


var customers = 
new XStreamingElement ("customers", 
from c in dbContext.Customers 
select 
new XStreamingElement ("customer", new XAttribute ("id", c.ID), 
new XElement ("name", c.Name), 
new XElement ("buys", c.Purchases.Count) 
) 
)5 
customers.Save ("data.xml"); 
The queries passed into an XStreamingElement’s constructor are not enumerated 
until you call Save, ToString, or WriteTo on the element; this avoids loading the 
whole X-DOM into memory at once. The flip side is that the queries are reevalu- 
ated, should you re-Save. Also, you cannot traverse an XStreamingElement’s child 


content—it does not expose methods such as Elements or Attributes. 


XStreamingElement is not based on XObject—or any other class—because it has 
such a limited set of members. The only members it has, besides Save, ToString, 
and WriteTo, are the following: 


e An Add method, which accepts content like the constructor 
e A Name property 
XStreamingElement does not allow you to read content in a streamed fashion—for 


this, you must use an XmlReader in conjunction with the X-DOM. We describe how 
to do this in “Patterns for Using XmlReader/XmlWriter” on page 511 in Chapter 11. 





500 | Chapter 10: LINQ to XML 


11 


Other XML and JSON 
Technologies 








In Chapter 10, we covered the LINQ-to-XML API—and XML in general. In this 
chapter, we explore the low-level XmlReader/XmlWriter classes and the types for 
working with JavaScript Object Notation (JSON), which has become a popular 
alternative to XML. 


In the online supplement, we describe the tools for working with XML schema and 
stylesheets. 


XmlReader 


XmlReader is a high-performance class for reading an XML stream in a low-level, 
forward-only manner. 


Consider the following XML file, customer.xml: 


<?xml version="1.0" encoding="utf-8" standalone="yes"?> 
<customer id="123" status="archived"> 
<firstname>Jim</firstname> 
<Lastname>Bo</lastname> 
</customer> 


To instantiate an XmlReader, you call the static XmlReader .Create method, passing 
in a Stream, a TextReader, or a URI string: 


using XmlReader reader = XmlReader.Create ("customer .xmL"); 


Because XmLReader lets you read from potentially slow sources 
(Streams and URIs), it offers asynchronous versions of most 
of its methods so that you can easily write nonblocking code. 
We cover asynchrony in detail in Chapter 14. 


To construct an XmlReader that reads from a string: 
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using XmlReader reader = XmlReader.Create ( 
new System.1I0.StringReader (myString)); 


You can also pass in an XmLReader Settings object to control parsing and validation 
options. The following three properties on XmlReader Settings are particularly use- 
ful for skipping over superfluous content: 


bool IgnoreComments // Skip over comment nodes? 
bool IgnoreProcessingInstructions // Skip over processing instructions? 
bool IgnoreWhitespace // Skip over whitespace? 


In the following example, we instruct the reader not to emit whitespace nodes, 
which are a distraction in typical scenarios: 


XmlReaderSettings settings = new XmlReaderSettings(); 
settings.IgnoreWhitespace = true; 


using XmlReader reader = XmlReader.Create ("customer.xml", settings); 


Another useful property on XmlReaderSettings is ConformanceLevel. Its default 
value of Document instructs the reader to assume a valid XML document with a sin- 
gle root node. This is a problem if you want to read just an inner portion of XML, 
containing multiple nodes: 


<firstname>Jim</firstname> 
<Lastname>Bo</lLastname> 


To read this without throwing an exception, you must set ConformanceLevel to 
Fragment. 


XmLlReaderSettings also has a property called CloseInput, which indicates whether 
to close the underlying stream when the reader is closed (there’s an analogous prop- 
erty on XmlWriterSettings called CloseOutput). The default value for CloseInput 
and CloseOQutput is false. 


Reading Nodes 


The units of an XML stream are XML nodes. The reader traverses the stream in tex- 
tual (depth-first) order. The Depth property of the reader returns the current depth 
of the cursor. 


The most primitive way to read from an XmlReader is to call Read. It advances to the 
next node in the XML stream, rather like MoveNext in IEnumerator. The first call to 
Read positions the cursor at the first node. When Read returns false, it means the 
cursor has advanced past the last node, at which point the XmlReader should be 
closed and abandoned. 


Two string properties on XmlReader provide access to a node’s content: Name and 
Value. Depending on the node type, either Name or Value (or both) are populated. 


In this example, we read every node in the XML stream, outputting each node type 
as We go: 
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XmlReaderSettings settings = new XmlReaderSettings(); 
settings.IgnoreWhitespace = true; 


using XmlReader reader = XmlReader.Create ("customer.xml", settings); 
while (reader .Read()) 
{ 


Console.Write (new string (' ', reader.Depth * 2)); // Write indentation 





+ 
Console.Write (reader.NodeType.ToString()); wu) 9 
Sos 
=] © 
if (reader.NodeType == XmlNodeType.Element || oc 
reader .NodeType == XmLNodeType.EndElement) a & 
{ g zr 
Console.Write (" Name=" + reader.Name); 
} 
else if (reader.NodeType == XmlNodeType. Text) 
{ 
Console.Write (" Value=" + reader.Value); 
Ji 
Console.WriteLine (); 


} 


The output is as follows: 


XmlDeclaration 
Element Name=customer 
Element Name=firstname 
Text Value=Jim 
EndElement Name=firstname 
Element Name=Lastname 
Text Value=Bo 
EndElement Name=Lastname 
EndElement Name=customer 


Attributes are not included in Read-based traversal (see 
“Reading Attributes” on page 507). 


NodeType is of type XmlNodeType, which is an enum with these members: 


None Comment Document 
XmlDeclaration Entity DocumentType 

Element EndEntity DocumentFragment 
EndELement EntityReference Notation 

Text ProcessingInstruction Whitespace 

Attribute CDATA SignificantWhitespace 
Reading Elements 


Often, you already know the structure of the XML document that you're reading. To 
help with this, XmLReader provides a range of methods that read while presuming a 
particular structure. This simplifies your code as well as performing some validation 
at the same time. 
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XmlReader throws an XmlException if any validation fails. 
XmlException has LineNumber and LinePosition properties 
indicating where the error occurred—logging this information 
is essential if the XML file is large! 


ReadStartElement verifies that the current NodeType is Element and then calls 
Read. If you specify a name, it verifies that it matches that of the current element. 


ReadEndElement verifies that the current NodeType is EndElement and then calls 
Read. 


For instance, we could read this: 
<firstname>Jim</firstname> 
as follows: 


reader .ReadStartElement ("firstname"); 
Console.WriteLine (reader.VaLue); 
reader .Read(); 

reader .ReadEndElement(); 


The ReadElementContentAsString method does all of this in one hit. It reads a start 
element, a text node, and an end element, returning the content as a string: 


string firstName = reader.ReadElementContentAsString ("firstname", ""); 


The second argument refers to the namespace, which is blank in this example. 
There are also typed versions of this method, such as ReadElementContentAsInt, 
which parse the result. Returning to our original XML document: 


<?xml version="1.0" encoding="utf-8" standalone="yes"?> 
<customer id="123" status="archived"> 

<firstname>Jim</firstname> 

<lastname>Bo</lLastname> 

<creditlimit>500.00</creditlimit> <!-- OK, we sneaked this in! --> 
</customer> 


We could read it in as follows: 


XmlReaderSettings settings = new XmlReaderSettings(); 
settings.IgnoreWhitespace = true; 


using XmlReader r = XmlReader.Create ("customer.xml", settings); 


r.MoveToContent(); // Skip over the XML declaration 
r.ReadStartElement ("customer"); 

string firstName = r.ReadElementContentAsString ("firstname", ""); 
string LastName = r.ReadElementContentAsString ("Lastname", ""); 


decimal creditLimit = r.ReadElementContentAsDecimal ("creditlimit", ""); 


r.MoveToContent(); // Skip over that pesky comment 
r.ReadEndElement( ) ; // Read the closing customer tag 
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The MoveToContent method is really useful. It skips over all 
the fluff: XML declarations, whitespace, comments, and pro- 
cessing instructions. You can also instruct the reader to do 
most of this automatically through the properties on 





XmlReaderSettings. 
F 4 
Optional elements ame) 
5S + 
5a7 
In the previous example, suppose that <lastname> was optional. The solution to oo% 
this is straightforward: Re) & 
o =r 
r.ReadStartElement ("customer"); sy) 
string firstName = r. ReadElementContentAsString ("firstname", ""); 
string LastName = r.Name == "Lastname" 


? r.ReadElementContentAsString() : null; 
decimal creditLimit = r.ReadElementContentAsDecimal ("creditlimit", ""); 


Random element order 


The examples in this section rely on elements appearing in the XML file in a set 
order. If you need to cope with elements appearing in any order, the easiest solution 
is to read that section of the XML into an X-DOM. We describe how to do this later 
in “Patterns for Using XmlReader/XmlWriter” on page 511. 


Empty elements 


The way that XmlReader handles empty elements presents a horrible trap. Consider 
the following element: 


<customerList></customerList> 
In XML, this is equivalent to the following: 
<customerList/> 


And yet, XmlReader treats the two differently. In the first case, the following code 
works as expected: 


reader .ReadStartElement ("customerList"); 
reader .ReadEndElement(); 


In the second case, ReadEndElement throws an exception because there is no sepa- 
rate “end element” as far as XmlLReader is concerned. The workaround is to check for 
an empty element: 


bool isEmpty = reader.IsEmptyElement; 
reader .ReadStartElement ("customerList"); 
if (!isEmpty) reader .ReadEndElement(); 


In reality, this is a nuisance only when the element in question might contain child 
elements (such as a customer list). With elements that wrap simple text (such as 
firstname), you can avoid the entire issue by calling a method such as ReadELement 
ContentAsString. The ReadElementXXX methods handle both kinds of empty ele- 
ments correctly. 
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Other ReadXXX methods 


Table 11-1 summarizes all ReadXXX methods in XmlReader. Most of these are 
designed to work with elements. The sample XML fragment shown in bold is the 
section read by the method described. 


Table 11-1. Read methods 


Members Works on Sample XML fragment Input Data 
NodeType parameters returned 

ReadContentAsxXxx Text <a>x</a> x 
ReadElementConten Element <a>x</a> x 
tASXXX 
ReadInnerXml Element <a>x</a> x 
ReadOuterXml Element <a>x</a> <a>x</a> 
ReadStartElement Element <a>x</a> 
ReadEndElement Element <a>x</a> 
ReadSubtree Element <a>x</a> <a>x</a> 
ReadToDescendant Element <a>x<b></b></a> "pb" 
ReadToFollowing Element <a>x<b></b></a> "pb" 
ReadToNextSibling Element <a>x</a><b></b> "pb" 
ReadAttributeValue Attribute See “Reading Attributes” 

on page 507 





The ReadContentAsXXxX methods parse a text node into type XXX. Internally, the 
XmlConvert class performs the string-to-type conversion. The text node can be 
within an element or an attribute. 


The ReadElementContentAsXXX methods are wrappers around corresponding Read 
ContentAsXXX methods. They apply to the element node rather than the text node 
enclosed by the element. 


ReadInnerxXn1 is typically applied to an element, and it reads and returns an element 
and all its descendants. When applied to an attribute, it returns the value of the 
attribute. ReadOuterXml is the same except that it includes rather than excludes the 
element at the cursor position. 


ReadSubtree returns a proxy reader that provides a view over just the current ele- 
ment (and its descendants). The proxy reader must be closed before the original 
reader can be safely read again. When the proxy reader is closed, the cursor position 
of the original reader moves to the end of the subtree. 


ReadToDescendant moves the cursor to the start of the first descendant node with 
the specified name/namespace. ReadToFollowing moves the cursor to the start of 
the first node—regardless of depth—with the specified name/namespace. 
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ReadToNextSibling moves the cursor to the start of the first sibling node with the 
specified name/namespace. 


There are also two legacy methods: ReadString and ReadElementString behave 
like ReadContentAsString and ReadElementContentAsString, except that they 
throw an exception if there’s more than a single text node within the element. You 
should avoid these methods because they throw an exception if an element contains 
a comment. 


Reading Attributes 


XmlReader provides an indexer giving you direct (random) access to an element's 
attributes—by name or position. Using the indexer is equivalent to calling 
GetAttribute. 


Given the XML fragment: 
<customer id="123" status="archived"/> 


we could read its attributes as follows: 


Console.WriteLine (reader ["id"]); // 123 
Console.WriteLine (reader ["status"]); // archived 
Console.WriteLine (reader ["bogus"] == null); // True 


The XmlReader must be positioned on a start element 
in order to read attributes. After calling ReadStartElement, 
the attributes are gone forever! 


Although attribute order is semantically irrelevant, you can access attributes by 
their ordinal position. We could rewrite the preceding example as follows: 


Console.WriteLine (reader [0]); // 123 
Console.WriteLine (reader [1]); // archived 


The indexer also lets you specify the attribute’s namespace—if it has one. 


AttributeCount returns the number of attributes for the current node. 


Attribute nodes 


To explicitly traverse attribute nodes, you must make a special diversion from the 
normal path of just calling Read. A good reason to do so is if you want to parse 
attribute values into other types, via the ReadContentAsXXX methods. 


The diversion must begin from a start element. To make the job easier, the forward- 
only rule is relaxed during attribute traversal: you can jump to any attribute (for- 
ward or backward) by calling MoveToAttribute. 


MoveToElement returns you to the start element from any- 
place within the attribute node diversion. 
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Returning to our previous example: 
<customer id="123" status="archived"/> 
we can do this: 


reader .MoveToAttribute ("status"); 
string status = reader.ReadContentAsString(); 


reader .MoveToAttribute ("id"); 
int id = reader.ReadContentAsInt(); 


MoveToAttribute returns false if the specified attribute doesn't exist. 


You can also traverse each attribute in sequence by calling the MoveToFirst 
Attribute and then the MoveToNextAttribute methods: 


if (reader .MoveToFirstAttribute()) 
do { Console.WriteLine (reader.Name + 
while (reader .MoveToNextAttribute()); 


+ reader.Value); } 


// OUTPUT: 
id=123 
status=archived 


Namespaces and Prefixes 
XmlReader provides two parallel systems for referring to element and attribute 
names: 

e Name 

e NamespaceURI and LocalName 
Whenever you read an element's Name property or call a method that accepts a single 
name argument, you're using the first system. This works well if no namespaces or 


prefixes are present; otherwise, it acts in a crude and literal manner. Namespaces are 
ignored, and prefixes are included exactly as they were written; for example: 


Sample fragment NENTS 


<customer ...> customer 
<customer xmlns='blah' ...> customer 
<x:customer ...> x:customer 





The following code works with the first two cases: 
reader .ReadStartElement ("customer"); 
The following is required to handle the third case: 


reader .ReadStartElement ("x:customer"); 
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The second system works through two namespace-aware properties: NamespaceURI 
and LocalName. These properties take into account prefixes and default namespaces 
defined by parent elements. Prefixes are automatically expanded. This means that 
NamespaceURI always reflects the semantically correct namespace for the current 
element, and LocalNane is always free of prefixes. 


When you pass two name arguments into a method such as ReadStartElement, 
youre using this same system. For example, consider the following XML: 


<customer xmlns="DefaultNamespace" xmlns:other="0therNamespace"> 
<address> 
<other:city> 


We could read this as follows: 


reader .ReadStartElement ("customer", "DefaultNamespace"); 
reader .ReadStartElement ("address", "DefaultNamespace"); 
reader .ReadStartElement ("city", "OtherNamespace"); 


Abstracting away prefixes is usually exactly what you want. If necessary, you can see 


what prefix was used through the Prefix property and convert it into a namespace 
by calling LookupNamespace. 


XmIWriter 


XmlWriter is a forward-only writer of an XML stream. The design of XmlWriter is 
symmetrical to XmlReader. 


As with XmlTextReader, you construct an XmlWriter by calling Create with an 
optional settings object. In the following example, we enable indenting to make 
the output more human-readable and then write a simple XML file: 


XmlWriterSettings settings = new XmlWriterSettings(); 
settings.Indent = true; 


using XmlWriter writer = XmlWriter.Create ("foo.xml", settings); 


writer.WriteStartElement ("customer"); 
writer.WriteELementString ("firstname", "Jim"); 
writer.WriteELementString ("Lastname", "Bo"); 
writer .WriteEndElement(); 


This produces the following document (the same as the file we read in the first 
example of XmlReader): 


<?xml verston="1.0" encoding="utf-8"?> 
<customer> 
<firstname>Jim</firstname> 
<Lastname>Bo</lLastname> 
</customer> 


XmlWriter automatically writes the declaration at the top unless you indicate other- 
wise in XmlWriterSettings by setting OmitXmlDeclaration to true or 
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ConformanceLevel to Fragment. The latter also permits writing multiple root nodes 
—something that otherwise throws an exception. 


The WriteValue method writes a single text node. It accepts both string and non- 
string types such as bool and DateTime, internally calling XmlConvert to perform 
XML-compliant string conversions: 


writer.WriteStartElement ("birthdate"); 
writer.WriteValue (DateTime.Now); 
writer.WriteEndElement(); 


In contrast, if we call: 
WriteElementString ("birthdate", DateTime.Now.ToString()); 


the result would be both non-XML-compliant and vulnerable to incorrect parsing. 


WriteString is equivalent to calling WriteValue with a string. XmlWriter automati- 
cally escapes characters that would otherwise be illegal within an attribute or ele- 
ment, such as & < >, and extended Unicode characters. 


Writing Attributes 
You can write attributes immediately after writing a start element: 


writer.WriteStartElement ("customer"); 
writer.WriteAttributeString ("id", "1"); 
writer.WriteAttributeString ("status", "archived"); 


To write nonstring values, call WriteStartAttribute, WriteValLue, and then Write 
EndAttribute. 


Writing Other Node Types 


XmlWriter also defines the following methods for writing other kinds of nodes: 


WriteBase64 // for binary data 
WriteBinHex // for binary data 
WriteCData 

WriteComment 

WriteDocType 

WriteEntityRef 
WriteProcessingInstruction 

WriteRaw 

WriteWhitespace 


WriteRaw directly injects a string into the output stream. There is also a WriteNode 
method that accepts an XmlReader, echoing everything from the given XmlReader. 


Namespaces and Prefixes 


The overloads for the Write* methods allow you to associate an element or attribute 
with a namespace. Let’s rewrite the contents of the XML file in our previous 
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example. This time we will associate all of the elements with the http://oreilly.com 
namespace, declaring the prefix o at the customer element: 


wan 


writer.WriteStartElement ("o", "customer", "http://oreilly.com"); 
writer.WriteElementString ("o", "firstname", "http://oreilly.com", "Jim"); 
writer.WriteElementString ("o", "Lastname", "http://oreilly.com", "Bo"); 
writer.WriteEndElement(); 


The output is now as follows: 


<?xml verston="1.0" encoding="utf-8"?> 
<o:customer xmlns:o='http://oreilly.com'> 
<o: firstname>Jim</o:firstname> 
<o: Lastname>Bo</o: Lastname> 
</o:customer> 
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Notice how for brevity XmlWriter omits the child element’s namespace declarations 
when they are already declared by the parent element. 


Patterns for Using XmlReader/XmlWriter 


Working with Hierarchical Data 
Consider the following classes: 


public class Contacts 
{ 
public IList<Customer> Customers = new List<Customer>(); 
public IList<Supplier> Suppliers = new List<Supplier>(); 
} 


public class Customer { public string FirstName, LastName; } 
public class Supplier { public string Name; } 


Suppose that you want to use XmlReader and XmlWriter to serialize a Contacts 
object to XML, as in the following: 


<?xml version="1.0" encoding="utf-8"?> 
<contacts> 
<customer id="1"> 
<firstname>Jay</firstname> 
<lastname>Dee</lLastname> 
</customer> 
<customer> <!-- we'll assume id is optional --> 
<firstname>Kay</firstname> 
<lastname>Gee</lLastname> 
</customer> 
<supplier> 
<name>X Technologies Ltd</name> 
</supplier> 
</contacts> 


The best approach is not to write one big method, but to encapsulate XML func- 
tionality in the Customer and Supplier types themselves by writing ReadXml and 
WriteXml methods on these types. The pattern in doing so is straightforward: 





Patterns for Using XmlReader/XmlWriter | 511 


e ReadXml and WriteXml leave the reader/writer at the same depth when they 
exit. 


e ReadXml reads the outer element, whereas WriteXml writes only its inner 
content. 


Here's how we would write the Customer type: 


public class Customer 


{ 
public const string XmlName = "customer"; 
public int? ID; 
public string FirstName, LastName; 


public Customer () { } 
public Customer (XmlReader r) { ReadXml (r); } 


public void ReadXml (XmlReader r) 


{ 
if (r.MoveToAttribute ("id")) ID = r.ReadContentAsInt(); 
r.ReadStartElement(); 
FirstName = r.ReadElementContentAsString ("firstname", ""); 
LastName = r.ReadElementContentAsString ("Lastname", ""); 
r.ReadEndElement(); 


} 


public void WriteXml (XmlWriter w) 
{ 
if (ID.HasValue) w.WriteAttributeString ("id", "", ID.ToString()); 
w.WriteElementString ("firstname", FirstName) ; 
w.WriteELementString ("Lastname", LastName); 
} 
} 


Notice that ReadXml reads the outer start and end element nodes. If its caller did this 


job instead, Customer couldn't read its own attributes. The reason for not making 
WriteXml symmetrical in this regard is twofold: 


e The caller might need to choose how the outer element is named. 


¢ The caller might need to write extra XML attributes, such as the element's sub- 
type (which could then be used to decide which class to instantiate when read- 
ing back the element). 


Another benefit of following this pattern is that it makes your implementation com- 
patible with IXmlSerializable (see “I[XmlSerializable” on page 736 in Chapter 17). 


The Supplier class is analogous: 


public class Supplier 
{ 


public const string XmlName = "supplier"; 
public string Name; 





512 | Chapter 11: Other XML and JSON Technologies 


public Supplier () { } 
public Supplier (XmlReader r) { ReadXml (r); } 


public void ReadXml (XmlReader r) 

{ 
r.ReadStartElement(); 
Name = r.ReadElementContentAsString ("name", ""); 
r.ReadEndElement(); 

} 


public void WriteXml (XmlWriter w) => 
w.WriteElementString ("name", Name); 


} 


With the Contacts class, we must enumerate the customers element in ReadXnl, 
checking whether each subelement is a customer or a supplier. We also need to code 
around the empty element trap: 


public void ReadXml (XmlReader r) 
{ 
bool isEmpty = r.IsEmptyElement; // This ensures we don't get 
r.ReadStartElement(); // snookered by an empty 
if (isEmpty) return; // <contacts/> element! 
while (r.NodeType == XmLNodeType.Element) 
{ 
if (r.Name == Customer .XmLlName) Customers.Add (new Customer (r)); 
else if (r.Name == Supplier.XmlName) Suppliers.Add (new Supplier (r)); 
else 
throw new XmlException ("Unexpected node: 


+ r.Name); 


r.ReadEndElement(); 
} 


public void WriteXml (XmlWriter w) 
{ 
foreach (Customer c in Customers) 
{ 
w.WriteStartElement (Customer .XmlName); 
c.WriteXml (w); 
w.WriteEndElement(); 
} 
foreach (Supplier s in Suppliers) 
{ 
w.WriteStartElement (Supplier .XmlName) ; 
s.WriteXml (w); 
w.WriteEndElement(); 
} 
} 


Here's how to serialize a Contacts object populated with Customers and Suppliers 
to an XML file: 


var settings = new XmlWriterSettings(); 
settings.Indent = true; // To make visual inspection easier 
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using XmlWriter writer = XmlWriter.Create ("contacts.xml", settings); 


var cts = new Contacts() 
// Add Customers and Suppliers... 


writer.WriteStartElement ("contacts"); 
cts.WriteXml (writer); 
writer.WriteEndElement(); 


Here's how to deserialize from the same file: 


var settings = new XmlReaderSettings(); 
settings.IgnoreWhitespace = true; 
settings.IgnoreComments = true; 
settings.IgnoreProcessingInstructions = true; 


using XmlReader reader = XmlReader.Create("contacts.xml", settings); 
reader .MoveToContent(); 

var cts = new Contacts(); 

cts.ReadXml(reader ); 


Mixing XmIReader/XmIWriter with an X-DOM 


You can fly in an X-DOM at any point in the XML tree where XmlReader or 
XmlWriter becomes too cumbersome. Using the X-DOM to handle inner elements 
is an excellent way to combine X-DOMS ease of use with the low-memory footprint 
of XmLReader and XmlWriter. 


Using XmIReader with XElement 


To read the current element into an X-DOM, you call XNode.ReadFrom, passing in 
the XmlReader. Unlike XELement.Load, this method is not “greedy” in that it doesn’t 
expect to see a whole document. Instead, it reads just the end of the current subtree. 


For instance, suppose that we have an XML logfile structured as follows: 


<log> 
<logentry id="1"> 
<date>...</date> 
<source>...</source> 


</logentry> 
</log> 
If there were a million logentry elements, reading the entire thing into an X-DOM 


would waste memory. A better solution is to traverse each logentry with an 
XmLReader and then use XElement to process the elements individually: 


XmlReaderSettings settings = new XmlReaderSettings(); 
settings.IgnoreWhitespace = true; 


using XmlReader r = XmlReader.Create ("Logfile.xml", settings); 
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r.ReadStartElement ("log"); 
while (r.Name == "Logentry") 


{ 
XElement logEntry = (XElement) XNode.ReadFrom (r); 


int id = (int) logEntry.Attribute ("id"); 
DateTime date = (DateTime) logEntry.Element ("date"); 
string source = (string) logEntry.Element ("source"); 


r.ReadEndElement(); 


If you follow the pattern described in the previous section, you can slot an XElement 
into a custom type’s ReadXml or WriteXml method without the caller ever knowing 
you've cheated! For instance, we could rewrite Customer’s ReadXml method as 
follows: 


public void ReadXml (XmlReader r) 


{ 
XElement x = (XElement) XNode.ReadFrom (r); 


ID = (int) x.Attribute ("id"); 
FirstName = (string) x.Element ("firstname"); 
LastName = (string) x.Element ("Lastname"); 


} 


XElement collaborates with XmlReader to ensure that namespaces are kept intact, 
and prefixes are properly expanded—even if defined at an outer level. So, if our 
XML file read like this: 


<log xmlns="http://loggingspace"> 
<logentry id="1"> 


the XElements we constructed at the logentry level would correctly inherit the 
outer namespace. 


Using XmIWriter with XElement 


You can use an XElement just to write inner elements to an XmlWriter. The follow- 
ing code writes a million logentry elements to an XML file using XELement— 
without storing the entire thing in memory: 


using XmlWriter w = XmlWriter.Create ("Logfile.xml"); 


w.WriteStartElement ("log"); 
for (int i = 0; i < 1000000; i++) 
{ 
XElement e = new XElement ("Logentry", 
new XAttribute ("id", i), 
new XElement ("date", DateTime.Today.AddDays (-1)), 
new XElement ("source", "test")); 
e.WriteTo (w); 


} 
w.WriteEndElement (); 
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Using an XElement incurs minimal execution overhead. If we amend this example 
to use XmlWriter throughout, there's no measurable difference in execution time. 


Working with JSON 


JSON has become a popular alternative to XML. Although it lacks the advanced fea- 
tures of XML (such as namespaces, prefixes, and schemas), it benefits from being 
simple and uncluttered, with a format similar to what you would get from convert- 
ing a JavaScript object to a string. 


In the past, you needed third-party libraries such as Json.NET to work with JSON in 
C#, but now you have the option of using .NET Core’s built-in classes. Compared to 
Json.NET, the built-in classes are less powerful, but simpler, faster, and more mem- 
ory efficient. 


In this section, we cover the following: 


e The forward-only reader and writer (Utf8JsonReader and Utf8JsonWriter) 
e The Document-Object-Model reader (JsonDocument). 


In Chapter 17, we cover JsonSerializer, which automatically serializes and deseri- 
alizes JSON to classes. 


Utf8JsonReader 


System.Text.Json.Utf8JsonReader is an optimized forward-only reader 
for UTF-8 encoded JSON text. Conceptually, it’s like the XmlReader introduced ear- 
lier in this chapter, and is used in much the same way. 


Consider the following JSON file named people.json: 


{ 
"FirstName":"Sara", 
"LastName":"Wells", 
"Age":35, 
"Friends": ["Dylan", "Ian" ] 
} 


The curly braces indicate a JSON object (which contains properties such as "First 
Name" and "LastName"), whereas the square brackets indicate a JSON array (which 
contains repeating elements). In this case, the repeating elements are strings, but 
they could be objects (or other arrays). 


The following code parses the file by enumerating its JSON tokens. A token is the 
beginning or end of an object, the beginning or end of an array, the name of a prop- 
erty, or an array or property value (string, number, true, false, or null). 


byte[] data = File.ReadAllBytes ("people. json"); 
Utf8JsonReader reader = new Utf8JsonReader (data); 
while (reader .Read()) 

{ 
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switch (reader.TokenType) 
{ 
case JsonTokenType.StartObject: 
Console.WriteLine ($"Start of object"); 
break; 
case JsonTokenType.EndObject: 
Console.WriteLine ($"End of object"); 
break; 
case JsonTokenType.StartArray: 
Console.WriteLine(); 
Console.WriteLine ($"Start of array"); 
break; 
case JsonTokenType.EndArray: 
Console.WriteLine ($"End of array"); 
break; 
case JsonTokenType.PropertyName: 
Console.Write ($"Property: {reader.GetString()}"); 
break; 
case JsonTokenType.String: 
Console.WriteLine ($" Value: {reader.GetString()}"); 
break; 
case JsonTokenType.Number: 
Console.WriteLine ($" Value: {reader.GetInt32()}"); 
break; 
default: 
Console.WriteLine ($"No support for {reader.TokenType}"); 
break; 
J 
} 


Here's the output: 


Start of object 

Property: FirstName Value: Sara 
Property: LastName Value: Wells 
Property: Age Value: 35 
Property: Friends 

Start of array 

Value: Dylan 

Value: Ian 

End of array 

End of object 


Because Utf8JsonReader works directly with UTF-8, it steps through the tokens 
without first having to convert the input into UTF-16 (the format of .NET strings). 
Conversion to UTF-16 takes place only when you call a method such as Get 
String(). 


Interestingly, Utf8JsonReader’s constructor does not accept a byte array, but rather 
a ReadOnlySpan<byte> (for this reason, Utf8JsonReader is defined as a ref struct). 
You can pass in a byte array because there’s an implicit conversion from T[] to 
ReadOnlySpan<T>. In Chapter 24, we describe how spans work, and how you can 
use them to improve performance by minimizing memory allocations. 
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JsonReaderOptions 


By default, Utf8JsonReader requires that the JSON conform strictly to the JSON 
RFC 8259 standard. You can instruct the reader to be more tolerant by passing an 
instance of JsonReaderOptions to the Utf8JsonReader constructor. The options 
allow the following: 


C-Style comments 
By default, comments in JSON cause a JsonException to be thrown. Setting 
the CommentHandling property to JsonCommentHandling.Skip causes com- 
ments to be ignored, whereas JsonCommentHandling.Allow causes the reader 
to recognize them and emit JsonTokenType.Comment tokens when they are 
encountered. Comments cannot appear in the middle of other tokens. 


Trailing commas 
Per the standard, the last property of an object and the last element of an array 
must not have a trailing comma. Setting the ALLowTrailingCommas property to 
true relaxes this restriction. 


Control over the maximum nesting depth 
By default, objects and arrays can nest to 64 levels. Setting the MaxDepth to a 
different number overrides this setting. 


Utf8JsonWriter 


System. Text. Json.Utf8JsonWriter is a forward-only JSON writer. It supports the 
following types: 


e String and DateTime (which is formatted as a JSON string) 


e The numeric types Int32, UInt32, Int64, UInt64, Single, Double, Decimal 
(which are formatted as JSON numbers) 


¢ bool (formatted as JSON true/false literals) 
e JSON null 
e Arrays 


You can organize these data types into objects in accordance with the JSON stan- 
dard. It also lets you write comments, which are not part of the JSON standard, but 
often supported by JSON parsers in practice. 


The following code demonstrates its use: 


var options = new JsonWriterOptions { Indented = true }; 


using (var stream = File.Create ("MyFile.json")) 
using (var writer = new Utf8JsonWriter (stream, options)) 
{ 
writer.WriteStartObject(); 
// Property name and value specified in one call 
writer.WriteString ("FirstName", "Dylan"); 
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writer.WriteString ("LastName", "Lockwood"); 
// Property name and value specified in separate calls 
writer.WritePropertyName ("Age"); 
writer.WriteNumberVaLlue (46); 
writer.WriteCommentValue ("This is a (non-standard) comment"); 
writer.WriteEndObject(); 

} 


This generates the following output file: 


t 
"FirstName": "Dylan", 
"LastName": "Lockwood", 
"Age": 46 
/*This is a (non-standard) comment*/ 


} 


In this example, we set the Indented property on JsonWriterOptions to true to 
improve readability. Had we not done so, the output would be as follows: 


{"FirstName":"Dylan","LastName":"Lockwood","Age":46...} 


The JsonWriterOptions also has an Encoder property to control the escaping of 
strings, and SkipValidation property to allow structural validation checks to be 
bypassed (allowing the emission of invalid output JSON). 


JsonDocument 


System. Text.Json.JsonDocument parses JSON data into a read-only DOM com- 
posed of lazily populated JsonElement instances that you can access randomly. 


JsonDocument is fast and efficient, employing pooled memory 
to minimize garbage collection. This means that you must dis- 
pose the JsonDocument after use; otherwise, its memory will 
not be returned to the pool. 


The static Parse method instantiates a JsonDocument from a stream, string, or 
memory buffer: 


using JsonDocument document = JsonDocument.Parse (jsonString); 


When calling Parse, you can optionally provide a JsonDocumentOptions object to 
control the handling of trailing commas, comments, and the maximum nesting 
depth (for a discussion on how these options work, see “JsonReaderOptions” on 
page 518). 


From there, you can access the DOM via the RootElement property: 


using JsonDocument document = JsonDocument.Parse ("123"); 
JsonElement root = document.RootElement; 
Console.WriteLine (root.ValueKind) ; // Number 


JsonElement can represent a JSON value (string, number, true/false, null), array, or 
object; the ValueKind property indicates which. 
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The methods that we describe in the following section throw 
an exception if the element isn’t of the kind expected. If you're 
not sure of a JSON file’s schema, you can avoid such excep- 
tions by checking ValueKind first. 


JsonElement also provides two methods that work for any 
kind of element: GetRawText() returns the inner JSON, and 
WriteTo writes that element to a Utf8JsonWriter. 


Reading simple values 


If the element represents a JSON value, you can obtain its value by calling 
GetString, GetInt32, GetBoolean, etc.: 


using JsonDocument document = JsonDocument.Parse ("123"); 
int number = document.RootElement.GetInt32(); 


JsonElement also provides methods to parse JSON strings into other commonly 
used CLR types such as DateTime (and even base-64 binary). There are also Try* 
versions that avoid throwing an exception if the parse fails. 


Reading JSON arrays 


If the JsonElement represents an array, you can call the following methods: 


EnumerateArray() 
Enumerates all the sub-items for a JSON array (as JsonElements). 


GetArrayLength() 
Returns the number of elements in the array. 


You can also use the indexer to return an element at a specific position: 


using JsonDocument document = JsonDocument.Parse (@"[1, 2, 3, 4, 5]")3 
int length = document.RootElement.GetArrayLength() ; ki S 
int value = document.RootElement[3].GetInt32(); //1 4 


Reading JSON objects 


If the element represents a JSON object, you can call the following methods: 


EnumerateObject() 
Enumerates all of the object's property names and values. 


GetProperty (string propertyName) 
Get a property by name (returning another JsonElement). Throws an excep- 
tion if the name isn’t present. 


TryGetProperty (string propertyName, out JsonElement value) 
Returns an object's property if present. 





520 | Chapter 11: Other XML and JSON Technologies 


For example: 


using JsonDocument document = JsonDocument.Parse (@"{ ""Age"": 32}"); 
JsonElement root = document.RootElement; 
int age = root.GetProperty ("Age") .GetInt32(); 


Here’s how we could “discover” the Age property: 


JsonProperty ageProp = root.EnumerateObject().First(); 


string name = ageProp.Name; // Age 
JsonElement value = ageProp.Value; 
Console.WriteLine (value.ValueKind) ; // Number 


Console.WriteLine (value.GetInt32()); // 32 


JsonDocument and LINQ 
JsonDocument lends itself well to LINQ. Given the following JSON file: 


[ 

{ 
"FirstName":"Sara", 
"LastName": "Wells", 
"Age":35, 
"Friends":["Ian"] 

}, 

{ 
"FirstName":"Ian", 
"LastName": "Weems", 
"Age":42, 
"Friends":["Joe","Eric","Li"] 

}, 

{ 
"FirstName": "Dylan", 
"LastName": "Lockwood", 
"Age":46, 
"Friends":["Sara","Ian"] 

} 

] 


we can use JsonDocument to query this with LINQ, as follows: 


using var stream = File.OpenRead (jsonPath); 
using JsonDocument document = JsonDocument.Parse (json); 


var query = 
from person in document.RootElement.EnumerateArray() 
select new 
{ 
FirstName = person.GetProperty ("FirstName") .GetString(), 
Age = person.GetProperty ("Age") .GetInt32(), 
Friends = 
from friend in person.GetProperty ("Friends") .EnumerateArray() 
select friend.GetString() 
}; 
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Because LINQ queries are lazily evaluated, it’s important to enumerate the query 
before the document goes out of scope and JsonDocument is implicitly disposed by 
virtue of the using statement. 


Making updates with a JSON writer 


Although JsonDocument is read-only, you can send the content of a JsonElement to 
a Utf8JsonWriter with the WriteTo method. This provides a mechanism for emit- 
ting a modified version of the JSON. Here's how we can take the JSON from the pre- 
ceding example and write it to a new JSON file that includes only people with two 
or more friends: 


using var json = File.OpenRead (jsonPath); 
using JsonDocument document = JsonDocument.Parse (json); 


var options = new JsonWriterOptions { Indented = true }; 


using (var outputStream = File.Create ("NewFile.json")) 
using (var writer = new Utf8JsonWriter (outputStream, options) ) 
{ 
writer.WriteStartArray(); 
foreach (var person in document.RootElement.EnumerateArray()) 
{ 
int friendCount = person.GetProperty ("Friends").GetArrayLength(); 
if (friendCount >= 2) 
person.WriteTo (writer); 
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12 


Disposal and Garbage Collection 








Some objects require explicit tear-down code to release resources such as open files, 
locks, operating system handles, and unmanaged objects. In .NET parlance, this is 
called disposal, and it is supported through the IDisposable interface. The managed 
memory occupied by unused objects must also be reclaimed at some point; this 
function is known as garbage collection and is performed by the CLR. 


Disposal differs from garbage collection in that disposal is usually explicitly instiga- 
ted; garbage collection is totally automatic. In other words, the programmer takes 
care of such things as releasing file handles, locks, and operating system resources 
while the CLR takes care of releasing memory. 


This chapter discusses both disposal and garbage collection, also describing C# 
finalizers and the pattern by which they can provide a backup for disposal. Lastly, 
we discuss the intricacies of the garbage collector and other memory management 
options. 


IDisposable, Dispose, and Close 


The .NET Core defines a special interface for types requiring a tear-down method: 


public interface IDisposable 


{ 


void Dispose(); 


I 
C#’s using statement provides a syntactic shortcut for calling Dispose on objects 
that implement IDisposable, using a try/finally block: 


using (FileStream fs = new FileStream ("myFile.txt", FileMode.Open) ) 


{ 
// ... Write to the file ... 


} 


The compiler converts this to the following: 
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FileStream fs = new FileStream ("myFile.txt", FileMode.Open); 
try 


{ 
// ... Write to the file ... 


} 
finally 


{ 
if (fs != null) ((IDisposable)fs).Dispose(); 


} 


The finally block ensures that the Dispose method is called even when an excep- 
tion is thrown or the code exits the block early. 


Similarly, the following syntax ensures disposal as soon as fs goes out of scope: 


using FileStream fs = new FileStream ("myFile.txt", FileMode.Open); 


// ... Write to the file ... 


In simple scenarios, writing your own disposable type is just a matter of implement- 
ing IDisposable and writing the Dispose method: 


sealed class Demo : IDisposable 


{ 


public void Dispose() 


{ 


// Perform cleanup / tear-down. 


This pattern works well in simple cases and is appropriate for 
sealed classes. In “Calling Dispose from a Finalizer” on page 
532, we describe a more elaborate pattern that can provide a 
backup for consumers that forget to call Dispose. With 
unsealed types, there’s a strong case for following this latter 
pattern from the outset—otherwise, it becomes very messy if 
the subtype wants to add such functionality itself. 


Standard Disposal Semantics 


.NET Core follows a de facto set of rules in its disposal logic. These rules are not 
hard-wired to .NET Core or the C# language in any way; their purpose is to define a 
consistent protocol to consumers. Here they are: 


1. After an object has been disposed, it’s beyond redemption. It cannot be reacti- 
vated, and calling its methods or properties (other than Dispose) throws an 
ObjectDisposedException. 


2. Calling an object’s Dispose method repeatedly causes no error. 


3. If disposable object x “owns” disposable object y, x’s Dispose method automati- 
cally calls y’s Dispose method—unless instructed otherwise. 
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These rules are also helpful when writing your own types, though they're not 
mandatory. Nothing prevents you from writing an Undispose method other than, 
perhaps, the flak you might cop from colleagues! 


According to rule 3, a container object automatically disposes its child objects. A 
good example is a Windows Forms container control such as a Form or Panel. The 
container can host many child controls, yet you don't dispose every one of them 
explicitly; closing or disposing the parent control or form takes care of the whole 
lot. Another example is when you wrap a FileStream in a DeflateStream. Dispos- 
ing the DeflateStream also disposes the FileStream—unless you instructed other- 
wise in the constructor. 


Close and Stop 


Some types define a method called Close in addition to Dispose. The Framework is 
not completely consistent on the semantics of a Close method, although in nearly 
all cases it’s either of the following: 


¢ Functionally identical to Dispose 


e A functional subset of Dispose 


An example of the latter is IDbConnection: a Closed connection can be re-Opened; a 
Disposed connection cannot. Another example is a Windows Form activated with 
ShowDialog: Close hides it; Dispose releases its resources. 


Some classes define a Stop method (e.g., Timer or HttpListener). A Stop method 
may release unmanaged resources, like Dispose, but unlike Dispose, it allows for 
re-Starting. 


With Windows Runtime (WinRT) libraries, Close is considered identical to 
Dispose—in fact, the runtime projects methods called Close into methods called 
Dispose, to make their types friendly to using statements. 


When to Dispose 


A safe rule to follow (in nearly all cases) is “if in doubt, dispose.” Objects wrapping 
an unmanaged resource handle will nearly always require disposal in order to free 
the handle. Examples include file or network streams, network sockets, Windows 
Forms controls, GDI+ pens, brushes, and bitmaps. Conversely, if a type is disposa- 
ble, it will often (but not always) reference an unmanaged handle, directly or indi- 
rectly. This is because unmanaged handles provide the gateway to the “outside 
world” of OS resources, network connections, database locks—the primary means 
by which objects can create trouble outside of themselves if improperly abandoned. 


There are, however, three scenarios for not disposing: 


e When you don’ “own” the object; for example, when obtaining a shared object 
via a static field or property 
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e When an object’s Dispose method does something that you don’t want 


e When an object's Dispose method is unnecessary by design, and disposing that 
object would add complexity to your program 


The first category is rare. The main cases are in the System. Drawing namespace: the 
GDI+ objects obtained through static fields or properties (such as Brushes .Blue) 
must never be disposed because the same instance is used throughout the life of the 
application. Instances that you obtain through constructors, however (such as 
new SolidBrush), should be disposed, as should instances obtained through static 
methods (such as Font. FromHdc). 


The second category is more common. There are some good examples in the 
System.10 and System.Data namespaces: 


Type Disposal function When not to dispose 

MemoryStream Prevents further 1/0 When you later need to read/write the stream 

StreamReader, Flushes the reader/writer When you want to keep the underlying stream open 

StreamWriter and closes the underlying (you must then call FLush ona StreamWriter 
stream when you're done) 

IDbConnection Releases a database If you need to re-Open it, you should call Close 


connection and clears the instead of Dispose 
connection string 


DbContext (EF Core) Prevents further use When you might have lazily evaluated queries 
connected to that context 





MemoryStream’s Dispose method disables only the object; it doesn’t perform any 
critical cleanup because a MemoryStream holds no unmanaged handles or other such 
resources. 


The third category includes the following classes: WebClient, StringReader, and 
StringWriter. These types are disposable under the duress of their base class rather 
than through a genuine need to perform essential cleanup. If you happen to instan- 
tiate and work with such an object entirely in one method, wrapping it in a using 
block adds little inconvenience. But if the object is longer lasting, keeping track of 
when it’s no longer used so that you can dispose of it adds unnecessary complexity. 
In such cases, you can simply ignore object disposal. 


Ignoring disposal can sometimes incur a performance cost 
(see “Calling Dispose from a Finalizer” on page 532). 


Clearing Fields in Disposal 


In general, you don’t need to clear an object's fields in its Dispose method. However, 
it is good practice to unsubscribe from events that the object has subscribed to 
internally over its lifetime (for an example, see “Managed Memory Leaks” on page 
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542). Unsubscribing from such events avoids receiving unwanted event 
notifications—and avoids unintentionally keeping the object alive in the eyes of the 
garbage collector (GC). 


A Dispose method itself does not cause (managed) memory 
to be released—this can happen only in garbage collection. 


It's also worth setting a field to indicate that the object is disposed so that you can 
throw an ObjectDisposedException if a consumer later tries to call members on 
the object. A good pattern is to use a publicly readable automatic property for this: 


public bool IsDisposed { get; private set; } 


Although technically unnecessary, it can also be good to clear an object’s own event 
handlers (by setting them to null) in the Dispose method. This eliminates the pos- 
sibility of those events firing during or after disposal. 


Occasionally, an object holds high-value secrets, such as encryption keys. In these 
cases, it can make sense to clear such data from fields during disposal (to avoid 
potential discovery by other processes on the machine when the memory is later 
released to the OS). The SymmetricAlgorithm class in System.Security 
.Cryptography does exactly this by calling Array.Clear on the byte array holding 
the encryption key. 


Anonymous Disposal 


Sometimes, it’s useful to implement IDisposable without having to write a class. 
For instance, suppose that you want to expose methods on a class that suspend and 
resume event processing: 


class Foo 


{ 


int _suspendCount; 


public void SuspendEvents() => _suspendCount++; 
public void ResumeEvents() => _suspendCount- -; 


void FireSomeEvent() 


{ 
if (_suspendCount == 0) 
. fire some event ... 


} 


Such an API is clumsy to use. Consumers must remember to call ResumeEvents. 
And to be robust, they must do so in a finally block (in case an exception is 
thrown): 


var foo = new Foo(); 
foo.SuspendEvents(); 
try 
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{ 


. do stuff ... // Because an exception could be thrown here 
} 
finally 
{ 


foo.ResumeEvents(); // ...we must call this in a finally block 
} 


A better pattern is to do away with ResumeEvents and have SuspendEvents return 
an IDisposable. Consumers can then do this: 


using (foo.SuspendEvents()) 


{ 
« do stuff... 


} 


The problem is that this pushes work onto whoever has to implement the Suspend 
Events method. Even with a good effort to reduce whitespace, we end up with this 
extra clutter: 


public IDisposable SuspendEvents() 


‘ _suspendCount++; 

return new SuspendToken (this); 
} 
class SuspendToken : IDisposable 
{ 

Foo _foo; 


public SuspendToken (Foo foo) => _foo = foo; 
public void Dispose() 
{ 
if (_foo != null) _foo._suspendCount- -; 
_foo = null; // Prevent against consumer disposing twice 


} 
} 


The anonymous disposal pattern solves this problem. With the following reusable 
class: 


public class Disposable : IDisposable 
{ 
public static Disposable Create (Action onDispose) 
=> new Disposable (onDispose); 


Action _onDispose; 
Disposable (Action onDispose) => _onDispose = onDispose; 


public void Dispose() 

{ 
_onDispose?. Invoke(); // Execute disposal action if non-null. 
_onDispose = null; // Ensure it can't execute a second time. 


} 
} 


We can reduce our SuspendEvents method to the following: 
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public IDisposable SuspendEvents() 
{ 


_suspendCount++; 
return Disposable.Create (() => _suspendCount--); 


} 


Automatic Garbage Collection 


Regardless of whether an object requires a Dispose method for custom tear-down 
logic, at some point the memory it occupies on the heap must be freed. The CLR 
handles this side of it entirely automatically via an automatic GC. You never deallo- 
cate managed memory yourself. For example, consider the following method: 


public void Test() 


{ 
byte[] myArray = new byte[1000]; 


} 


When Test executes, an array to hold 1,000 bytes is allocated on the memory heap. 
The array is referenced by the variable myArray, stored on the local variable stack. 
When the method exits, this local variable myArray pops out of scope, meaning that 
nothing is left to reference the array on the memory heap. The orphaned array then 
becomes eligible to be reclaimed in garbage collection. 


In debug mode with optimizations disabled, the lifetime of an 
object referenced by a local variable extends to the end of the 
code block to ease debugging. Otherwise, it becomes eligible 
for collection at the earliest point at which it’s no longer used. 


Garbage collection does not happen immediately after an object is orphaned. Rather 
like garbage collection on the street, it happens periodically, although (unlike 
garbage collection on the street) not to a fixed schedule. The CLR bases its decision 
on when to collect upon a number of factors, such as the available memory, the 
amount of memory allocation, and the time since the last collection (the GC self- 
tunes to optimize for an application’s specific memory access patterns). This means 
that there's an indeterminate delay between an object being orphaned and being 
released from memory. This delay can range from nanoseconds to days. 


The GC doesnt collect all garbage with every collection. 
Instead, the memory manager divides objects into generations 
and the GC collects new generations (recently allocated 
objects) more frequently than old generations (long-lived 
objects). We discuss this in more detail in “How the GC 
Works” on page 536. 
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Garbage Collection and Memory Consumption 


The GC tries to strike a balance between the time it spends doing garbage collection 
and the applications memory consumption (working set). Consequently, 
applications can consume more memory than they need, particularly if large tempo- 
rary arrays are constructed. 


You can monitor a process's memory consumption via the Windows Task Manager 
or Resource Monitor—or programmatically by querying a performance counter: 


// These types are in System.Diagnostics: 

string procName = Process.GetCurrentProcess().ProcessName; 

using PerformanceCounter pc = new PerformanceCounter 
("Process", "Private Bytes", procName) ; 

Console.WriteLine (pc.NextValue()); 


This queries the private working set, which gives the best overall indication of your 
program’s memory consumption. Specifically, it excludes memory that the CLR has 
internally deallocated and is willing to rescind to the OS should another process 
need it. 











Roots 


A root is something that keeps an object alive. If an object is not directly or indi- 
rectly referenced by a root, it will be eligible for garbage collection. 


A root is one of the following: 


¢ A local variable or parameter in an executing method (or in any method in its 
call stack) 


e A static variable 


e An object on the queue that stores objects ready for finalization (see the next 
section) 


It's impossible for code to execute in a deleted object, so if there’s any possibility of 
an (instance) method executing, its object must somehow be referenced in one of 
these ways. 


Note that a group of objects that reference each other cyclically are considered dead 
without a root referee (see Figure 12-1). To put it in another way, objects that can- 
not be accessed by following the arrows (references) from a root object are unreach- 
able—and therefore subject to collection. 
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Figure 12-1. Roots 


Garbage Collection and WinRT 


WinRT relies on a reference-counting mechanism to release memory instead of 
depending on an automatic GC. Despite this, WinRT objects that you instantiate 
from C# have their lifetime managed by the CLR’s GC because the CLR mediates 
access to the underlying COM object through an object that it creates behind the 
scenes called a runtime callable wrapper (Chapter 24). 


Finalizers 


Prior to an object being released from memory, its finalizer runs, if it has one. A 
finalizer is declared like a constructor, but it is prefixed by the ~ symbol: 


class Test 


{ 
~Test() 


{ 


// Finalizer logic... 
} 
} 


(Although similar in declaration to a constructor, finalizers cannot be declared as 
public or static, cannot have parameters, and cannot call the base class.) 


Finalizers are possible because garbage collection works in distinct phases. First, the 
GC identifies the unused objects ripe for deletion. Those without finalizers are 
deleted immediately. Those with pending (unrun) finalizers are kept alive (for now) 
and are put onto a special queue. 
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At that point, garbage collection is complete, and your program continues execut- 
ing. The finalizer thread then kicks in and starts running in parallel to your pro- 
gram, picking objects off that special queue and running their finalization methods. 
Prior to each object's finalizer running, it’s still very much alive—that queue acts as a 
root object. After it's been dequeued and the finalizer executed, the object becomes 
orphaned and will be deleted in the next collection (for that object’s generation). 


Finalizers can be useful, but they come with some provisos: 
e Finalizers slow the allocation and collection of memory (the GC needs to keep 
track of which finalizers have run). 


¢ Finalizers prolong the life of the object and any referred objects (they must all 
await the next garbage truck for actual deletion). 


e It’s impossible to predict in what order the finalizers for a set of objects will be 
called. 


e You have limited control over when the finalizer for an object will be called. 
¢ If code in a finalizer blocks, other objects cannot be finalized. 
¢ Finalizers can be circumvented altogether if an application fails to unload 


cleanly. 


In summary, finalizers are somewhat like lawyers—although there are cases in 
which you really need them, in general you don't want to use them unless absolutely 
necessary. If you do use them, you need to be 100% sure you understand what they 
are doing for you. 


Here are some guidelines for implementing finalizers: 


¢ Ensure that your finalizer executes quickly. 
e Never block in your finalizer (see “Blocking” on page 578 in Chapter 14). 
e Dont reference other finalizable objects. 


¢ Don't throw exceptions. 


The CLR can call an object’s finalizer even if an exception is 
thrown during construction. For this reason, it pays not to 
assume that fields are correctly initialized when writing a 
finalizer. 


Calling Dispose from a Finalizer 


A popular pattern is to have the finalizer call Dispose. This makes sense when 
cleanup is not urgent and hastening it by calling Dispose is more of an optimization 
than a necessity. 
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Keep in mind that with this pattern you couple memory deal- 
location to resource deallocation—two things with potentially 
divergent interests (unless the resource is itself memory). You 
also increase the burden on the finalization thread. 


This pattern also serves as a backup for cases when a con- 
sumer simply forgets to call Dispose. However, it’s then a 
good idea to log the failure so that you can fix the bug. 


There’s a standard pattern for implementing this, as follows: 


class Test : IDisposable 


: public void Dispose() // NOT virtual 
{ 
Dispose (true); 
GC.SuppressFinalize (this); // Prevent finalizer from running. 
} 
protected virtual void Dispose (bool disposing) 
{ 
if (disposing) 
: // Call Dispose() on other objects owned by this instance. 
// You can reference other finalizable objects here. 
i ere 
} 
// Release unmanaged resources owned by (just) this object. 
LT chess 
} 


~Test() => Dispose (false); 


} 


Dispose is overloaded to accept a bool disposing flag. The parameterless version 
is not declared as virtual and simply calls the enhanced version with true. 


The enhanced version contains the actual disposal logic and is protected and 
virtual; this provides a safe point for subclasses to add their own disposal logic. 
The disposing flag means it’s being called “properly” from the Dispose method 
rather than in “last-resort mode” from the finalizer. The idea is that when called 
with disposing set to false, this method should not, in general, reference other 
objects with finalizers (because such objects might themselves have been finalized 
and so be in an unpredictable state). This rules out quite a lot! Here are a couple of 
tasks that the Dispose method can still perform in last-resort mode, when 
disposing is false: 


e Releasing any direct references to OS resources (obtained, perhaps, via a P/ 
Invoke call to the Win32 API) 


¢ Deleting a temporary file created on construction 
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To make this robust, any code capable of throwing an exception should be wrapped 
in a try/catch block, and the exception, ideally, logged. Any logging should be as 
simple and robust as possible. 


Notice that we call GC.SuppressFinalize in the parameterless Dispose method— 
this prevents the finalizer from running when the GC later catches up with it. Tech- 
nically, this is unnecessary given that Dispose methods must tolerate repeated calls. 
However, doing so improves performance because it allows the object (and its refer- 
enced objects) to be garbage-collected in a single cycle. 


Resurrection 


Suppose a finalizer modifies a living object such that it refers back to the dying 
object. When the next garbage collection happens (for the object’s generation), the 
CLR will see the previously dying object as no longer orphaned—and so it will 
evade garbage collection. This is an advanced scenario, and is called resurrection. 


To illustrate, suppose that we want to write a class that manages a temporary file. 
When an instance of that class is garbage-collected, wed like the finalizer to delete 
the temporary file. It sounds easy: 


public class TempFileRef 


{ 
public readonly string FilePath; 
public TempFileRef (string filePath) { FilePath = filePath; } 


~TempFileRef() { File.Delete (FilePath); } 
} 


Unfortunately, this has a bug: File.Delete might throw an exception (due to a lack 
of permissions, perhaps, or the file being in use, or having already been deleted). 
Such an exception would take down the entire application (as well as preventing 
other finalizers from running). We could simply “swallow” the exception with an 
empty catch block, but then wed never know that anything went wrong. Calling 
some elaborate error-reporting API would also be undesirable because it would bur- 
den the finalizer thread, hindering garbage collection for other objects. We want to 
restrict finalization actions to those that are simple, reliable, and quick. 


A better option is to record the failure to a static collection, as follows: 


public class TempFileRef 
{ 


static internal readonly ConcurrentQueue<TempFileRef> FailedDeletions 
= new ConcurrentQueue<TempFileRef>(); 


public readonly string FilePath; 
public Exception DeletionError { get; private set; } 


public TempFileRef (string filePath) { FilePath = filePath; } 


~TempFileRef() 
{ 
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try { File.Delete (FilePath); } 
catch (Exception ex) 


{ 
DeletionError = ex; 
FailedDeletions.Enqueue (this); // Resurrection 


} 
I 
} 


Enqueuing the object to the static FailedDeletions collection gives the object 
another referee, ensuring that it remains alive until the object is eventually 
dequeued. 


ConcurrentQueue<T> is a thread-safe version of Queue<T> and 
is defined in System.Collections.Concurrent (see Chap- 
ter 23). There are a couple of reasons for using a thread-safe 
collection. First, the CLR reserves the right to execute finaliz- 
ers on more than one thread in parallel. This means that when 
accessing shared state such as a static collection, we must con- 
sider the possibility of two objects being finalized at once. Sec- 
ond, at some point we're going to want to dequeue items from 
FailedDeletions so that we can do something about them. 
This also must be done in a thread-safe fashion because it 
could happen while the finalizer is concurrently enqueuing 
another object. 


-) ole BL=15) 
99 jesodsiq 


(@) 
e) 
oO 
a 
= 
fe) 
3 





GC.ReRegisterForFinalize 


A resurrected object’s finalizer will not run a second time—unless you call 
GC.ReRegisterForFinalize. 


In the following example, we try to delete a temporary file in a finalizer (as in the 
last example). But if the deletion fails, we reregister the object so as to try again in 
the next garbage collection: 


public class TempFileRef 

{ 
public readonly string FilePath; 
int _deleteAttempt; 


public TempFileRef (string filePath) { FilePath = filePath; } 


~TempFileRef() 

{ 
try { File.Delete (FilePath); } 
catch 
{ 

if (_deleteAttempt++ < 3) GC.ReRegisterForFinalize (this); 

} 

} 

} 
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After the third failed attempt, our finalizer will silently give up trying to delete the 
file. We could enhance this by combining it with the previous example—in other 
words, adding it to the FailedDeletions queue after the third failure. 


Be careful to call ReRegisterForFinalize just once in the 
finalizer method. If you call it twice, the object will be reregis- 
tered twice and will have to undergo two more finalizations! 


How the GC Works 


The standard CLR uses a generational mark-and-compact GC that performs auto- 
matic memory management for objects stored on the managed heap. The GC is 
considered to be a tracing GC in that it doesn’t interfere with every access to an 
object, but rather wakes up intermittently and traces the graph of objects stored on 
the managed heap to determine which objects can be considered garbage and there- 
fore collected. 


The GC initiates a garbage collection upon performing a memory allocation (via the 
new keyword), either after a certain threshold of memory has been allocated or at 
other times to reduce the application’s memory footprint. This process can also be 
initiated manually by calling System.GC.Collect. During a garbage collection, all 
threads can be frozen (more on this in the next section). 


The GC begins with its root object references and walks the object graph, marking 
all the objects it touches as reachable. When this process is complete, all objects that 
have not been marked are considered unused and are subject to garbage collection. 


Unused objects without finalizers are immediately discarded; unused objects with 
finalizers are enqueued for processing on the finalizer thread after the GC is com- 
plete. These objects then become eligible for collection in the next GC for the 
object’s generation (unless resurrected). 


The remaining “live” objects are then shifted to the start of the heap (compacted), 
freeing space for more objects. This compaction serves two purposes: it avoids 
memory fragmentation, and it allows the GC to employ a very simple strategy when 
allocating new objects, which is to always allocate memory at the end of the heap. 
This avoids the potentially time-consuming task of maintaining a list of free mem- 
ory segments. 


If there is insufficient space to allocate memory for a new object after garbage col- 
lection and the OS is unable to grant further memory, an OutOfMemoryException is 
thrown. 


Optimization Techniques 


The GC incorporates various optimization techniques to reduce the garbage collec- 
tion time. 
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Generational collection 


The most important optimization is that the GC is generational. This takes advan- 
tage of the fact that although many objects are allocated and discarded rapidly, cer- 
tain objects are long-lived and thus don't need to be traced during every collection. 


Basically, the GC divides the managed heap into three generations. Objects that 
have just been allocated are in GenO, and objects that have survived one collection 
cycle are in Gen; all other objects are in Gen2. Gen0 and Genl are known as 
ephemeral (short-lived) generations. 


The CLR keeps the Gen0 section relatively small (with a typical size of a few hun- 
dred kilobytes to a few megabytes). When the Gen0 section fills up, the GC insti- 
gates a GenO collection—which happens relatively often. The GC applies a similar 
memory threshold to Gen1 (which acts as a buffer to Gen2), and so Gen! collec- 
tions are relatively quick and frequent, too. Full collections that include Gen2, how- 
ever, take much longer and so happen infrequently. Figure 12-2 shows the effect of a 
full collection. 
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Gen2 Genl GenO 
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Gen2 ‘Geni 
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Figure 12-2. Heap generations 


To give some very rough ballpark figures, a GenO collection might take less than one 
millisecond, which is not enough to be noticed in a typical application. A full collec- 
tion, however, might take as long as 100 ms on a program with large object graphs. 
These figures depend on numerous factors and so can vary considerably—particu- 
larly in the case of Gen2, whose size is unbounded (unlike GenO and Gen1). 
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The upshot is that short-lived objects are very efficient in their use of the GC. The 
StringBuilders created in the following method would almost certainly be collec- 
ted in a fast GenO: 


string Foo() 
{ 


var sb1 = new StringBuilder ("test"); 
sb1.Append ("..."); 
var sb2 = new StringBuilder ("test"); 
sb2.Append (sb1.ToString()); 
return sb2.ToString(); 

} 


The Large Object Heap 


The GC uses a separate heap called the Large Object Heap (LOH) for objects larger 
than a certain threshold (currently 85,000 bytes). This avoids the cost of compacting 
large objects, and avoids excessive Gen0 collections—without the LOH, allocating a 
series of 16 MB objects might trigger a GenO collection after every allocation. 


By default, the LOH is not subject to compaction, because moving large blocks of 
memory during garbage collection would be prohibitively expensive. This has two 
consequences: 


e Allocations can be slower, because the GC can't always simply allocate objects 
at the end of the heap—it must also look in the middle for gaps, and this 
requires maintaining a linked list of free memory blocks.! 


e The LOH is subject to fragmentation. This means that the freeing of an object 
can create a hole in the LOH that can be difficult to fill later. For instance, a 
hole left by an 86,000-byte object can be filled only by an object of between 
85,000 bytes and 86,000 bytes (unless adjoined by another hole). 


Should you anticipate a problem with fragmentation, you can instruct the GC to 
compact the LOH in the next collection, as follows: 


GCSettings.LargeObjectHeapCompactionMode = 
GCLargeObjectHeapCompactionMode.CompactOnce; 


Another workaround, if your program frequently allocates large arrays, is to 
use .NET Core’s array pooling API (see “Array Pooling” on page 541). 


The LOH is also nongenerational: all objects are treated as Gen2. 





1 The same thing can occur occasionally in the generational heap due to pinning (see “The fixed 
Statement” on page 220 in Chapter 4). 





538 | Chapter 12: Disposal and Garbage Collection 


Workstation versus server collection 


.NET Core provides two garbage collection modes: workstation and server. Worksta- 
tion is the default; you can switch to server by adding the following to your applica- 
tion’s .csproj file: 


<PropertyGroup> 
<ServerGarbageCollection>true</ServerGarbageCollection> 
</PropertyGroup> 
Upon building your project, this setting is written to the application's .runtimecon- 
fig.json file, where’s it’s read by the CLR: 


"runtimeOptions": { 
"configProperties": { 
"System.GC.Server": true 


When server collection is enabled, the CLR allocates a separate heap and GC to each 
core. This speeds up collection, but consumes additional memory and CPU resour- 
ces (because each core requires its own thread). Should the machine be running 
many other processes with server collection enabled, this can lead to CPU oversub- 
scription, which is particularly harmful on workstations because it makes the OS as 
a whole feel unresponsive. 


Server collection is available only on multicore systems: on single-core devices (or 
single-core virtual machines), the setting is ignored. 


Background collection 


In both workstation and server modes, the CLR enables background collection by 
default. You can disable it by adding the following to your application's .csproj file: 


<PropertyGroup> 
<ConcurrentGarbageCollection>false</ConcurrentGarbageCollection> 
</PropertyGroup> 


Upon building, this setting is written to the application’s .runtimeconfig.json file: 


"runtimeOptions": { 
"configProperties": { 
"System.GC.Concurrent": false, 


The GC must freeze (block) your execution threads for periods during a collection. 
Background collection minimizes these periods of latency, making your application 
more responsive. This comes at the expense of consuming slightly more CPU and 
memory. Hence, by disabling background collection, you accomplish the following: 


¢ Slightly reduce CPU and memory usage 


¢ Increase the pauses (or latency) when a garbage collection occurs 
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Background collection works by allowing your application code to run in parallel 
with a Gen2 collection. (Gen0 and Gen1 collections are considered sufficiently fast 
that they don’t benefit from this parallelism.) 


Background collection is an improved version of what was formerly called concur- 
rent collection: it removes a limitation whereby a concurrent collection would cease 
to be concurrent if the Gen0 section filled up while a Gen2 collection was running. 
This allows applications that continually allocate memory to be more responsive. 


GC notifications 


If you disable background collection, you can ask the GC to notify you just before a 
full (blocking) collection will occur. This is intended for server-farm configurations: 
the idea is that you divert requests to another server just before a collection. You 
then instigate the collection immediately and wait for it to complete before rerout- 
ing requests back to that server. 


To start notification, call GC.RegisterForFullGCNotification. Then, start up 
another thread (see Chapter 14) that first calls GC.WaitForFullLGCApproach. When 
this method returns a GCNotificationStatus indicating that a collection is near, 
you can reroute requests to other servers and force a manual collection (see the fol- 
lowing section). You then call GC.WaitForFullGCComplete: when this method 
returns, collection is complete, and you can again accept requests. You then repeat 
the whole cycle. 


Forcing Garbage Collection 


You can manually force a garbage collection at any time by calling GC.Collect. Call- 
ing GC.Collect without an argument instigates a full collection. If you pass in an 
integer value, only generations to that value are collected, so GC.Collect(0) per- 
forms only a fast GenO collection. 


In general, you get the best performance by allowing the GC to decide when to col- 
lect: forcing collection can hurt performance by unnecessarily promoting Gen0 
objects to Genl (and Genl objects to Gen2). It can also upset the GC’s self-tuning 
ability, whereby the GC dynamically tweaks the thresholds for each generation to 
maximize performance as the application executes. 


There are exceptions, however. The most common case for intervention is when an 
application goes to sleep for a while: a good example is a Windows Service that per- 
forms a daily activity (checking for updates, perhaps). Such an application might 
use a System. Timers.Timer to initiate the activity every 24 hours. After completing 
the activity, no further code executes for 24 hours, which means that for this period, 
no memory allocations are made and so the GC has no opportunity to activate. 
Whatever memory the service consumed in performing its activity, it will continue 
to consume for the following 24 hours—even with an empty object graph! The solu- 
tion is to call GC. Collect right after the daily activity completes. 
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To ensure the collection of objects for which collection is delayed by finalizers, you 
can take the additional step of calling WaitForPendingFinalizers and 
re-collecting: 


GC.Collect(); 
GC.WaitForPendingFinalizers(); 
GC.Collect(); 


Often this is done in a loop: the act of running finalizers can free up more objects 
that themselves have finalizers. 


Another case for calling GC.Collect is when youre testing a class that has a 
finalizer. 


Tuning Garbage Collection at Runtime 


The static GCSettings.LatencyMode property determines how the GC balances 
latency with overall efficiency. Changing this from its default value of Interactive 
to either LowLatency or SustainedLowLatency instructs the CLR to favor quicker 
(but more frequent) collections. This is useful if your application needs to respond 
very quickly to real-time events. Changing the mode to Batch maximizes through- 
put at the expense of potentially poor responsiveness, which is useful for batch 
processing. 


SustainedLowLatency is not supported if you disable background collection in 
the .runtimeconfig.json file. 


You can also tell the CLR to temporarily suspend garbage collection by calling 
GC.TryStartNoGCRegion, and resume it with GC. EndNoGCRegion. 


Memory Pressure 


The runtime decides when to initiate collections based on a number of factors, 
including the total memory load on the machine. If your program allocates unman- 
aged memory (Chapter 25), the runtime will get an unrealistically optimistic per- 
ception of its memory usage because the CLR knows only about managed memory. 
You can mitigate this by instructing the CLR to assume that a specified quantity of 
unmanaged memory has been allocated; you do this by calling GC.Add 
MemoryPressure. To undo this (when the unmanaged memory is released), call 
GC.RemoveMemoryPressure. 


Array Pooling 


If your application frequently instantiates arrays, you can avoid most of the garbage 
collection overhead with array pooling. Array pooling is new to .NET Core 3, and 
works by “renting” an array, which you later return to a pool for reuse. 


To allocate an array, call the Rent method on the ArrayPool class in the 
System.Buffers namespace, indicating the size of the array that youd like: 


int[] pooledArray = ArrayPool<int>.Shared.Rent (100); // 100 bytes 





HowtheGC Works | 541 


(@) 
@ 
o 
a 
= 
fe) 
3 


J) o}-fe BL=19) 
99 jesodsiq 





This allocates an array of (at least) 100 bytes from the global shared array pool. The 
pool manager might give you an array that’s larger than what you asked for (typi- 
cally, it allocates in powers of 2). 


When you've finished with the array, call Return: this releases the array to the pool, 
allowing the same array to be rented again: 


ArrayPool<int>.Shared.Return (pooledArray) ; 


You can optionally pass in a Boolean value instructing the pool manager to clear the 
array before returning it to the pool. 


A limitation of array pooling is that nothing prevents you 
from continuing to (illegally) use an array after it’s been 
returned, so you need to code carefully to avoid this scenario. 
Keep in mind that you have the power to break not just your 
own code, but other APIs that use array pooling, too, such as 
ASP.NET Core. 


Rather than using the shared array pool, you can create a custom pool and rent 
from that. This avoids the risk of breaking other APIs, but increases overall memory 
usage (as it reduces the opportunities for reuse): 


var myPool = ArrayPool<int>.Create(); 
int[] array = myPool.Rent (100); 


Managed Memory Leaks 


In unmanaged languages such as C++, you must remember to manually deallocate 
memory when an object is no longer required; otherwise, a memory leak will result. 
In the managed world, this kind of error is impossible due to the CLR’s automatic 
garbage collection system. 


Nonetheless, large and complex .NET applications can exhibit a milder form of the 
same syndrome with the same end result: the application consumes more and more 
memory over its lifetime, until it eventually must be restarted. The good news is that 
managed memory leaks are usually easier to diagnose and prevent. 


Managed memory leaks are caused by unused objects remaining alive by virtue of 
unused or forgotten references. A common candidate is event handlers—these hold 
a reference to the target object (unless the target is a static method). For instance, 
consider the following classes: 


class Host 


{ 


public event EventHandler Click; 


} 


class Client 
{ 
Host _host; 
public Client (Host host) 
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{ 


_host = host; 
_host.Click += HostClicked; 
} 
void HostClicked (object sender, EventArgs e) { ... } 
} 
The following test class contains a method that instantiates 1,000 clients: 
class Test 
{ 


static Host _host = new Host(); 


public static void CreateClients() 


{ 
Client[] clients = Enumerable.Range (0, 1000) 


.Select (i => new Client (_host)) 
.ToArray(); 


// Do something with clients ... 


t 
I 
You might expect that after CreateClients finishes executing, the 1,000 Client 
objects will become eligible for collection. Unfortunately, each client has another 
referee: the _host object whose Click event now references each Client instance. 
This can go unnoticed if the Click event doesn't fire—or if the HostClicked 
method doesn't do anything to attract attention. 


One way to solve this is to make Client implement IDisposable, and in the 
Dispose method, unhook the event handler: 


public void Dispose() { _host.Click -= HostClicked; } 
Consumers of Client then dispose of the instances when they’re done with them: 
Array.ForEach (clients, c => c.Dispose()); 


In “Weak References” on page 545, we describe another solu- 
tion to this problem, which can be useful in environments that 
tend not to use disposable objects (an example is WPF). In 
fact, the WPF framework offers a class called WeakEvent 
Manager that uses a pattern that employs weak references. 


Timers 


Forgotten timers can also cause memory leaks (we discuss timers in Chapter 22). 
There are two distinct scenarios, depending on the kind of timer. Let’s first look at 
the timer in the System. Timers namespace. In the following example, the Foo class 
(when instantiated) calls the tmr_ELapsed method once every second: 


using System.Timers; 


class Foo 
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{ 


Timer _timer; 


Foo() 
{ 


_timer = new System.Timers.Timer { Interval = 1000 }; 
_timer.Elapsed += tmr_Elapsed; 
_timer .Start(); 


} 


void tmr_Elapsed (object sender, ElapsedEventArgs e) { ... } 
} 


Unfortunately, instances of Foo can never be garbage-collected! The problem 
is .NET Core itself holds references to active timers so that it can fire their Elapsed 
events; hence: 


e .NET Core will keep _timer alive. 


e _timer will keep the Foo instance alive, via the tmr_Elapsed event handler. 


The solution is obvious when you realize that Timer implements IDisposable. Dis- 
posing of the timer stops it and ensures that .NET Core no longer references the 
object: 


class Foo : IDisposable 


{ 


public void Dispose() { _timer.Dispose(); } 


} 


A good guideline is to implement IDisposable yourself if any 
field in your class is assigned an object that implements 
IDisposable. 


The WPF and Windows Forms timers behave in the same way with respect to what’s 
just been discussed. 


The timer in the System.Threading namespace, however, is special. .NET Core 
doesn't hold references to active threading timers; it instead references the callback 
delegates directly. This means that if you forget to dispose of a threading timer, a 
finalizer can fire that will automatically stop and dispose of the timer: 


static void Main() 


{ 
var tmr = new System.Threading.Timer (TimerTick, null, 1000, 1000); 
GC.Collect(); 
System. Threading. Thread.Sleep (10000); // Wait 10 seconds 


} 


static void TimerTick (object notUsed) { Console.WriteLine ("tick"); } 
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If this example is compiled in “release” mode (debugging disabled and optimiza- 
tions enabled), the timer will be collected and finalized before it has a chance to fire 
even once! Again, we can fix this by disposing of the timer when we're done with it: 


using (var tmr = new System.Threading.Timer (TimerTick, null, 1000, 1000)) 


{ 
GC.Collect(); 
System. Threading. Thread.Sleep (10000); // Wait 10 seconds 


} 


The implicit call to tmr Dispose at the end of the using block ensures that the tmr 
variable is “used” and so not considered dead by the GC until the end of the block. 
Ironically, this call to Dispose actually keeps the object alive longer! 


Diagnosing Memory Leaks 


The easiest way to avoid managed memory leaks is to proactively monitor memory 
consumption as an application is written. You can obtain the current memory con- 
sumption of a program’s objects as follows (the true argument tells the GC to per- 
form a collection first): 


long memoryUsed = GC.GetTotalMemory (true); 


If you're practicing test-driven development, one possibility is to use unit tests to 
assert that memory is reclaimed as expected. If such an assertion fails, you then 
need examine only the changes that you've made recently. 


If you already have a large application with a managed memory leak, the windbg.exe 
tool can assist in finding it. There are also friendlier graphical tools such as Micro- 
soft’s CLR Profiler, SciTech Memory Profiler, and Red Gate's ANTS Memory 
Profiler. 


The CLR also exposes numerous event counters to assist with resource monitoring. 


Weak References 


Occasionally, it’s useful to hold a reference to an object that’s “invisible” to the GC in 
terms of keeping the object alive. This is called a weak reference and is implemented 
by the System.WeakReference class. 


To use WeakReference, construct it with a target object: 


var sb = new StringBuilder ("this is a test"); 

var weak = new WeakReference (sb); 

Console.WriteLine (weak.Target); // This is a test 
If a target is referenced only by one or more weak references, the GC will consider 
the target eligible for collection. When the target is collected, the Target property of 
the WeakReference will be null: 


var weak = new WeakReference (new StringBuilder ("weak")); 
Console.WriteLine (weak.Target); // weak 
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GC.Collect(); 
Console.WriteLine (weak. Target); // (nothing) 


To avoid the target being collected in between testing for it being null and consum- 
ing it, assign the target to a local variable: 


var weak = new WeakReference (new StringBuilder ("weak")); 
var sb = (StringBuilder) weak.Target; 
if (sb != null) { /* Do something with sb */ } 


After a target’s been assigned to a local variable, it has a strong root and so cannot 
be collected while that variable’s in use. 


The following class uses weak references to keep track of all Widget objects that 
have been instantiated, without preventing those objects from being collected: 


class Widget 
{ 


static List<WeakReference> _allWidgets = new List<WeakReference>(); 
public readonly string Name; 


public Widget (string name) 
{ 


Name = name; 
_allWidgets.Add (new WeakReference (this)); 
} 


public static void ListAllWidgets() 
{ 


foreach (WeakReference weak in _allWidgets) 
{ 
Widget w = (Widget )weak.Target; 
if (w != null) Console.WriteLine (w.Name); 
} 
} 
} 


The only proviso with such a system is that the static list will grow over time, accu- 
mulating weak references with null targets. So, you need to implement some 
cleanup strategy. 


Weak References and Caching 


One use for WeakReference is to cache large object graphs. This allows memory- 
intensive data to be cached briefly without causing excessive memory consumption: 


_weakCache = new WeakReference (...);  // _weakCache is a field 


var cache = _weakCache.Target; 
if (cache == null) { /* Re-create cache & assign it to _weakCache */ } 


This strategy can be only mildly effective in practice because you have little control 
over when the GC fires and what generation it chooses to collect. In particular, if 
your cache remains in Gen0, it can be collected within microseconds (and 





546 | Chapter 12: Disposal and Garbage Collection 


remember that the GC doesn't collect only when memory is low—it collects regu- 
larly under normal memory conditions). So, at a minimum, you should employ a 
two-level cache whereby you start out by holding strong references that you convert 
to weak references over time. 


Weak References and Events 


We saw earlier how events can cause managed memory leaks. The simplest solution 
is to either avoid subscribing in such conditions, or implement a Dispose method to 
unsubscribe. Weak references offer another solution. 


Imagine a delegate that holds only weak references to its targets. Such a delegate 
would not keep its targets alive—unless those targets had independent referees. Of 
course, this wouldn't prevent a firing delegate from hitting an unreferenced target— 
in the time between the target being eligible for collection and the GC catching up 
with it. For such a solution to be effective, your code must be robust in that sce- 
nario. Assuming that is the case, you can implement a weak delegate class as follows: 


public class WeakDelegate<TDelegate> where TDelegate : class 
{ 

class MethodTarget 

{ 


public readonly WeakReference Reference; 
public readonly MethodInfo Method; 


public MethodTarget (Delegate d) 


t 
// d.Target will be null for static method targets: 


if (d.Target != null) Reference = new WeakReference (d.Target); 
Method = d.Method; 
} 
a; 


List<MethodTarget> _targets = new List<MethodTarget>(); 


public WeakDelegate() 


{ 
if (!typeof (TDelegate).IsSubclassOf (typeof (Delegate) )) 


throw new InvalidOperationException 
("TDelegate must be a delegate type"); 


} 


public void Combine (TDelegate target) 


{ 
if (target == null) return; 


foreach (Delegate d in (target as Delegate) .GetInvocationList()) 
_targets.Add (new MethodTarget (d)); 


} 


public void Remove (TDelegate target) 


{ 
if (target == null) return; 
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foreach (Delegate d in (target as Delegate) .GetInvocationList()) 


{ 
MethodTarget mt = _targets.Find (w => 


Equals (d.Target, w.Reference?.Target) && 
Equals (d.Method.MethodHandle, w.Method.MethodHandle) ); 


if (mt != null) _targets.Remove (mt); 
} 
} 


public TDelegate Target 


{ 
get 


{ 


Delegate combinedTarget = null; 


foreach (MethodTarget mt in _targets.ToArray()) 
{ 


WeakReference wr = mt.Reference; 


// Static target || alive instance target 
if (wr == null || wr.Target != null) 
{ 
var newDelegate = Delegate.CreateDelegate ( 
typeof(TDelegate), wr?.Target, mt.Method); 
combinedTarget = Delegate.Combine (combinedTarget, newDelegate) ; 


} 


else 
_targets.Remove (mt); 


} 
return combinedTarget as TDelegate; 


set 


{ 
_targets.Clear(); 
Combine (value); 


} 
ii 
I 
This code illustrates several interesting points in C# and the CLR. First, note that we 
check that TDelegate is a delegate type in the constructor. This is because of a limi- 
tation in C#—the following type constraint is illegal because C# considers 
System.Delegate a special type for which constraints are not supported: 


. where TDelegate : Delegate // Compiler doesn't allow this 


Instead, we must choose a class constraint, and perform a runtime check in the 
constructor. 


In the Combine and Remove methods, we perform the reference conversion from 
target to Delegate via the as operator rather than the more usual cast operator. 
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This is because C# disallows the cast operator with this type parameter—because of 
a potential ambiguity between a custom conversion and a reference conversion. 


We then call GetInvocationList because these methods might be called with multi- 
cast delegates—delegates with more than one method recipient. 


In the Target property, we build up a multicast delegate that combines all the dele- 
gates referenced by weak references whose targets are alive, removing the remaining 
(dead) references from the list to avoid the _targets list endlessly growing. (We 
could improve our class by doing the same in the Combine method; yet another 
improvement would be to add locks for thread safety [see “Locking and Thread 
Safety” on page 582 in Chapter 14]). We also allow delegates without a weak refer- 
ence at all; these represent delegates whose target is a static method. 


The following illustrates how to consume this delegate in implementing an event: 
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public class Foo 


{ 


WeakDelegate<EventHandler> _click = new WeakDelegate<EventHandler>(); 





public event EventHandler Click 


add { _click.Combine (value); } remove { _click.Remove (value); } 


} 


protected virtual void OnClick (EventArgs e) 
=> _click.Target?.Invoke (this, e); 
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13 


Diagnostics 








When things go wrong, it’s important that information is available to aid in diagnos- 
ing the problem. An Integrated Development Environment (IDE) or debugger can 
assist greatly to this effect—but it is usually available only during development. 
After an application ships, the application itself must gather and record diagnostic 
information. To meet this requirement, .NET Core provides a set of facilities to log 
diagnostic information, monitor application behavior, detect runtime errors, and 
integrate with debugging tools if available. 


Some diagnostic tools and APIs are Windows specific because they rely on features 
of the Windows operating system. In an effort to prevent platform-specific APIs 
from cluttering .NET Core, Microsoft has shipped them in separate NuGet packages 
that you can optionally reference. There are more than a dozen Windows-specific 
packages, which you can reference all at once with the Microsoft. Windows.Compati- 
bility “master” package. 


The types in this chapter are defined primarily in the System.Diagnostics 
namespace. 


Conditional Compilation 


You can conditionally compile any section of code in C# with preprocessor directives. 
Preprocessor directives are special instructions to the compiler that begin with the 
# symbol (and, unlike other C# constructs, must appear on a line of their own). Log- 
ically, they execute before the main compilation takes place (although in practice, 
the compiler processes them during the lexical parsing phase). The preprocessor 
directives for conditional compilation are #if, #else, #endif, and #elif. 


The #if directive instructs the compiler to ignore a section of code unless a speci- 
fied symbol has been defined. You can define a symbol in source code by using the 
#define directive (in which case the symbol applies to just that file), or in the .csproj 
file by using a <DefineConstants> element (in which case the symbol applies to 
whole assembly): 
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#define TESTMODE // #define directives must be at top of file 
// Symbol names are uppercase by convention. 
using System; 


class Program 


{ 


static void Main() 


{ 
#if TESTMODE 


Console.WriteLine ("in test mode!"); // OUTPUT: in test mode! 
#endif 


ij 
} 


If we deleted the first line, the program would compile with the Console.WriteLine 
statement completely eliminated from the executable, as though it were commented 
out. 


The #else statement is analogous to C#’s else statement, and #elif is equivalent to 
#else followed by #if. The ||, &&, and ! operators perform or, and, and not 
operations: 


#if TESTMODE && !PLAYMODE // if TESTMODE and not PLAYMODE 


Keep in mind, however, that youre not building an ordinary C# expression, and the 
symbols upon which you operate have absolutely no connection to variables—static 
or otherwise. 


You can define symbols that apply to every file in an assembly by editing the .csproj 
file (or in Visual Studio, by going to the Build tab in the Project Properties window). 
The following defines two constants, TESTMODE and PLAYMODE: 


<PropertyGroup> 
<DefineConstants>TESTMODE ; PLAYMODE</DefineConstants> 
</PropertyGroup> 
If you've defined a symbol at the assembly level and then want to “undefine” it for a 
particular file, you can do so by using the #undef directive. 


Conditional Compilation Versus Static Variable Flags 
You could instead implement the preceding example with a simple static field: 


static internal bool TestMode = true; 


static void Main() 


{ 


if (TestMode) Console.WriteLine ("in test mode!"); 


} 


This has the advantage of allowing runtime configuration. So, why choose condi- 
tional compilation? The reason is that conditional compilation can take you places 
variable flags cannot, such as the following: 
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¢ Conditionally including an attribute 
e Changing the declared type of variable 


¢ Switching between different namespaces or type aliases in a using directive; for 
example: 


using TestType = 
#if V2 
MyCompany.Widgets.GadgetV2; 
#else 
MyCompany.Widgets.Gadget; 
#endif 
You can even perform major refactoring under a conditional compilation directive, 
so you can instantly switch between old and new versions, and write libraries that 
can compile against multiple Framework versions, leveraging the latest Framework 
features where available. 


Another advantage of conditional compilation is that debugging code can refer to 
types in assemblies that are not included in deployment. 


The Conditional Attribute 


The Conditional attribute instructs the compiler to ignore any calls to a particular 
class or method, if the specified symbol has not been defined. 


To see how this is useful, suppose that you write a method for logging status infor- 
mation as follows: 


static void LogStatus (string msg) 


{ 
string logFilePath =... 


System.I0.File.AppendAllText (logFilePath, msg + "\r\n"); 
} 


Now imagine that you want this to execute only if the LOGGINGMODE symbol is 
defined. The first solution is to wrap all calls to LogStatus around an #if directive: 


#if LOGGINGMODE 
LogStatus ("Message Headers: " + GetMsgHeaders()); 
#endif 


This gives an ideal result, but it is tedious. The second solution is to put the #if 


directive inside the LogStatus method. This, however, is problematic should Log 
Status be called as follows: 


LogStatus ("Message Headers: + GetComplexMessageHeaders()); 


GetComplexMessageHeaders would always be called—which might incur a perfor- 
mance hit. 
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We can combine the functionality of the first solution with the convenience of the 
second by attaching the Conditional attribute (defined in System.Diagnostics) to 
the LogStatus method: 


[Conditional ("LOGGINGMODE") ] 
static void LogStatus (string msg) 


{ 
— 


This instructs the compiler to treat calls to LogStatus as though they were wrapped 
in an #if LOGGINGMODE directive. If the symbol is not defined, any calls to 
LogStatus are eliminated entirely in compilation—including their argument evalu- 
ation expressions. (Hence any side-effecting expressions will be bypassed.) This 
works even if LogStatus and the caller are in different assemblies. 


Another benefit of [Conditional] is that the conditionality 
check is performed when the caller is compiled, rather than 
when the called method is compiled. This is beneficial because 
it allows you to write a library containing methods such as Log 
Status—and build just one version of that library. 


The Conditional attribute is ignored at runtime—it’s purely an instruction to the 
compiler. 


Alternatives to the Conditional attribute 


The Conditional attribute is useless if you need to dynamically enable or disable 
functionality at runtime: instead, you must use a variable-based approach. This 
leaves the question of how to elegantly circumvent the evaluation of arguments 
when calling conditional logging methods. A functional approach solves this: 


using System; 
using System.Ling; 


class Program 


{ 
public static bool EnableLogging; 


static void LogStatus (Func<string> message) 
{ 
string logFilePath =... 
if (EnableLogging) 
System.1I0.File.AppendAllText (logFilePath, message() + "\r\n"); 
} 
} 


A lambda expression lets you call this method without syntax bloat: 


LogStatus ( () => "Message Headers: + GetComplexMessageHeaders() ); 


If EnableLogging is false, GetComplexMessageHeaders is never evaluated. 
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Debug and Trace Classes 


Debug and Trace are static classes that provide basic logging and assertion capabili- 
ties. The two classes are very similar; the main differentiator is their intended use. 
The Debug class is intended for debug builds; the Trace class is intended for both 
debug and release builds. To this effect: 


e All methods of the Debug class are defined with [Conditional("DEBUG") ]. 


e All methods of the Trace class are defined with [Conditional("TRACE") ]. 


This means that all calls that you make to Debug or Trace are eliminated by the 
compiler unless you define DEBUG or TRACE symbols. (Visual Studio provides check- 
boxes for defining these symbols in the Build tab of Project Properties and enables 
the TRACE symbol by default with new projects.) 


Both the Debug and Trace classes provide Write, WriteLine, and WriteIf methods. 
By default, these send messages to the debugger’s output window: 


Debug.Write ("Data"); 

Debug.WriteLine (23 * 34); 

int x = 5, y = 3; 

Debug.WriteIf (x > y, "x is greater than y"); 
The Trace class also provides the methods TraceInformation, TraceWarning, and 
TraceError. The difference in behavior between these and the Write methods 
depends on the active TraceListeners (we cover this in “TraceListener” on page 
556). 


Fail and Assert 


The Debug and Trace classes both provide Fail and Assert methods. Fail sends 
the message to each TraceListener in the Debug or Trace class's Listeners collec- 
tion (see the following section), which by default writes the message to the debug 
output: 


Debug.Fail ("File data.txt does not exist!"); 


Assert simply calls Fail if the bool argument is false—this is called making an 
assertion and indicates a bug in the code if violated. Specifying a failure message is 
optional: 


Debug.Assert (File.Exists ("data.txt"), "File data.txt does not exist!"); 
var result =... 
Debug.Assert (result != null); 


The Write, Fail, and Assert methods are also overloaded to accept a string cate- 
gory in addition to the message, which can be useful in processing the output. 


An alternative to assertion is to throw an exception if the opposite condition is true. 
This is a common practice when validating method arguments: 
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public void ShowMessage (string message) 


{ 


if (message == null) throw new ArgumentNullException ("message"); 


fs 


Such “assertions” are compiled unconditionally and are less flexible in that you can't 
control the outcome of a failed assertion via TraceListeners. And technically, 
they’re not assertions. An assertion is something that, if violated, indicates a bug in 
the current method's code. Throwing an exception based on argument validation 
indicates a bug in the caller's code. 


TraceListener 


The Trace class has a static Listeners property that returns a collection of Trace 
Listener instances. These are responsible for processing the content emitted by the 
Write, Fail, and Trace methods. 


By default, the Listeners collection of each includes a single listener (Default 
TraceListener). The default listener has two key features: 


¢ When connected to a debugger such as Visual Studio, messages are written to 
the debug output window; otherwise, message content is ignored. 


e When the Fail method is called (or an assertion fails), the application is 
terminated. 


You can change this behavior by (optionally) removing the default listener and then 
adding one or more of your own. You can write trace listeners from scratch (by sub- 
classing TraceListener) or use one of the predefined types: 


¢ TextWriterTraceListener writes to a Stream or TextWriter or appends to a 
file. 


e EventLogTraceListener writes to the Windows event log (Windows only). 


e EventProviderTraceListener writes to the Event Tracing for Windows 
(ETW) subsystem (cross-platform support). 


TextWriterTraceListener is further subclassed to ConsoleTraceListener, 
DelimitedListTraceListener, XmlWriterTraceListener, and EventSchemaTrace 
Listener. 


The following example clears Trace’s default listener and then adds three listeners— 
one that appends to a file, one that writes to the console, and one that writes to the 
Windows event log: 


// Clear the default listener: 
Trace.Listeners.Clear(); 


// Add a writer that appends to the trace.txt file: 
Trace.Listeners.Add (new TextWriterTraceListener ("trace.txt")); 
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// Obtain the Console's output stream, then add that as a listener: 
System.10.TextWriter tw = Console.Out; 
Trace.Listeners.Add (new TextWriterTraceListener (tw)); 


// Set up a Windows Event log source and then create/add listener. 
// CreateEventSource requires administrative elevation, so this would 
// typically be done in application setup. 
if (!EventLog.SourceExists ("DemoApp")) 
EventLog.CreateEventSource ("DemoApp", "Application"); 


Trace.Listeners.Add (new EventLogTraceListener ("DemoApp")); 


In the case of the Windows event log, messages that you write with the Write, Fail, 
or Assert method always display as Information messages in the Windows event 
viewer. Messages that you write via the TraceWarning and TraceError methods, 
however, show up as warnings or errors. 


TraceListener also has a Filter of type TraceFilter that you can set to control 
whether a message gets written to that listener. To do this, you either instantiate one 
of the predefined subclasses (EventTypeFilter or SourceFilter), or subclass Trace 
Filter and override the ShouldTrace method. You could use this to filter by cate- 
gory, for instance. 
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TraceListener also defines IndentLevel and IndentSize properties for controlling 
indentation, and the TraceOutputOptions property for writing extra data: 


TextWriterTraceListener tl = new TextWriterTraceListener (Console.Out); 
tl.TraceOutputOptions = TraceOptions.DateTime | TraceOptions.Callstack; 


TraceOutputOptions are applied when using the Trace methods: 


Trace.TraceWarning ("Orange alert"); 


DiagTest.vshost.exe Warning: 0 : Orange alert 
DateTime=2007-03-08T05:57:13.6250000Z 


Callstack= at System.Environment.GetStackTrace(Exception e, Boolean 
needFileInfo) 
at System. Environment. get_StackTrace() (ct eee 


Flushing and Closing Listeners 


Some listeners, such as TextWriterTraceListener, ultimately write to a stream that 
is subject to caching. This has two implications: 


e A message might not appear in the output stream or file immediately. 


e You must close—or at least flush—the listener before your application ends; 
otherwise, you lose what's in the cache (up to 4 KB, by default, if you're writing 
to a file). 


The Trace and Debug classes provide static Close and Flush methods that call Close 
or Flush on all listeners (which in turn calls Close or Flush on any underlying 
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writers and streams). Close implicitly calls Flush, closes file handles, and prevents 
further data from being written. 


As a general rule, call Close before an application ends and call Flush any time you 
want to ensure that current message data is written. This applies if you're using 
stream- or file-based listeners. 


Trace and Debug also provide an AutoF Lush property, which, if true, forces a Flush 
after every message. 


It’s a good policy to set AutoF lush to true on Debug and Trace 
if you're using any file- or stream-based listeners. Otherwise, if 
an unhandled exception or critical error occurs, the last 4 KB 
of diagnostic information can be lost. 


Debugger Integration 


Sometimes, it’s useful for an application to interact with a debugger if one is avail- 
able. During development, the debugger is usually your IDE (e.g., Visual Studio); in 
deployment, the debugger is more likely to be one of the lower-level debugging 
tools, such as WinDbg, Cordbg, or Mdbg. 


Attaching and Breaking 


The static Debugger class in System. Diagnostics provides basic functions for inter- 
acting with a debugger—namely Break, Launch, Log, and IsAttached. 


A debugger must first attach to an application in order to debug it. If you start an 
application from within an IDE, this happens automatically, unless you request 
otherwise (by choosing “Start without debugging”). Sometimes, though, it’s incon- 
venient or impossible to start an application in debug mode within the IDE. An 
example is a Windows Service application or (ironically) a Visual Studio designer. 
One solution is to start the application normally and then, in your IDE, choose 
Debug Process. This doesn’t allow you to set breakpoints early in the program’s exe- 
cution, however. 


The workaround is to call Debugger.Break from within your application. This 
method launches a debugger, attaches to it, and suspends execution at that point. 
(Launch does the same, but without suspending execution.) After it’s attached, you 
can log messages directly to the debugger’s output window with the Log method. 
You can verify whether you're attached to a debugger by checking the IsAttached 
property. 


Debugger Attributes 


The DebuggerStepThrough and DebuggerHidden attributes provide suggestions to 
the debugger on how to handle single-stepping for a particular method, constructor, 
or class. 
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Debugger StepThrough requests that the debugger step through a function without 
any user interaction. This attribute is useful in automatically generated methods and 
in proxy methods that forward the real work to a method somewhere else. In the 
latter case, the debugger will still show the proxy method in the call stack if a break- 
point is set within the “real” method—unless you also add the DebuggerHidden 
attribute. You can combine these two attributes on proxies to help the user focus on 
debugging the application logic rather than the plumbing: 


[DebuggerStepThrough, DebuggerHidden ] 
void DoWorkProxy() 


{ 
// setup... 
DoWork(); 
// teardown... 


} 


void DoWork() {...} // Real method... 


Processes and Process Threads 


We described in the last section of Chapter 6 how to use Process.Start to launch a 
new process. The Process class also allows you to query and interact with other 
processes running on the same or another computer. The Process class is part 
of .NET Standard 2.0, although its features are restricted for the UWP platform. 


Examining Running Processes 


The Process.GetProcessXXX methods retrieve a specific process by name or pro- 
cess ID, or all processes running on the current or nominated computer. This 
includes both managed and unmanaged processes. Each Process instance has a 
wealth of properties mapping statistics such as name, ID, priority, memory and pro- 
cessor utilization, window handles, and so on. The following sample enumerates all 
the running processes on the current computer: 


foreach (Process p in Process.GetProcesses()) 


using (p) 
{ 
Console.WriteLine (p.ProcessName) ; 
Console.WriteLine (" PID: "+ p.Id); 
Console.WriteLine ("| Memory: "+ p.WorkingSet64) ; 
Console.WriteLine ("| Threads: " + p.Threads.Count); 
} 


Process.GetCurrentProcess returns the current process. 


You can terminate a process by calling its Kill method. 
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Examining Threads in a Process 


You can also enumerate over the threads of other processes, with the 
Process.Threads property. The objects that you get, however, are not System 
. Threading. Thread objects; they're ProcessThread objects and are intended for 
administrative rather than synchronization tasks. A ProcessThread object provides 
diagnostic information about the underlying thread and allows you to control some 
aspects of it such as its priority and processor affinity: 


public void EnumerateThreads (Process p) 


{ 
foreach (ProcessThread pt in p.Threads) 
{ 
Console.WriteLine (pt.Id); 
Console.WriteLine (" State: "+ pt.ThreadState); 
Console.WriteLine ("| Priority: " + pt.PriorityLevel); 
Console.WriteLine ("| Started: " + pt.StartTime); 
Console.WriteLine ("| CPU time: " + pt.TotalProcessorTime) ; 
} 
} 


StackTrace and StackFrame 


The StackTrace and StackFrame classes provide a read-only view of an execution 
call stack. You can obtain stack traces for the current thread or an Exception object. 
Such information is useful mostly for diagnostic purposes, though you also can use 
it in programming (hacks). StackTrace represents a complete call stack; Stack 
Frame represents a single method call within that stack. 


If you just need to know the name and line number of the call- 
ing method, caller info attributes can provide an easier and 
faster alternative. We cover this topic in “Caller Info 
Attributes” on page 206 in Chapter 4. 


If you instantiate a StackTrace object with no arguments—or with a bool 
argument—you get a snapshot of the current thread’s call stack. The bool argument, 
if true, instructs StackTrace to read the assembly .pdb (project debug) files if they 
are present, giving you access to filename, line number, and column offset data. 
Project debug files are generated when you compile with the /debug switch. (Visual 
Studio compiles with this switch unless you request otherwise via Advanced Build 
Settings.) 


After you've obtained a StackTrace, you can examine a particular frame by calling 
GetFrame—or obtain the whole lot by using GetFrames: 


static void Main() { A (); } 
static void A() { B (); } 
static void B() { Cc (); } 
static void C() 

{ 


StackTrace s = new StackTrace (true); 





560 | Chapter 13: Diagnostics 


Console.WriteLine ("Total frames: + s.FrameCount); 
Console.WriteLine ("Current method: " + s.GetFrame(0).GetMethod().Name); 
Console.WriteLine ("Calling method: " + s.GetFrame(1).GetMethod().Name) ; 
Console.WriteLine ("Entry method: "+ s.GetFrame 
(s.FrameCount-1).GetMethod().Name) ; 
Console.WriteLine ("Call Stack:"); 
foreach (StackFrame f in s.GetFrames()) 
Console.WriteLine ( 
"File: "| + f.GetFileName() + 
Line: "| + f.GetFileLineNumber() + 
*. Cols? + f.GetFileCoLumnNumber() + 
" Offset: " + f.GetILOffset() + 
"Method: " + £.GetMethod().Name); 
} 


Here's the output: 


Total frames: 4 

Current method: C 

Calling method: B 

Entry method: Main 

Call stack: 
File: C:\Test\Program.cs Line: 15 Col: 4 Offset: 7 Method: C 
File: C:\Test\Program.cs Line: 12 Col: 22 Offset: 6 Method: B 
File: C:\Test\Program.cs Line: 11 Col: 22 Offset: 6 Method: A 
File: C:\Test\Program.cs Line: 10 Col: 25 Offset: 6 Method: Main 
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The Intermediate Language (IL) offset indicates the offset of 
the instruction that will execute next—not the instruction 
that’s currently executing. Peculiarly, though, the line and col- 
umn number (if a .pdb file is present) usually indicate the 
actual execution point. 


This happens because the CLR does its best to infer the actual 
execution point when calculating the line and column from 
the IL offset. The compiler emits IL in such a way as to make 
this possible—including inserting nop (no-operation) instruc- 
tions into the IL stream. 


Compiling with optimizations enabled, however, disables the 
insertion of nop instructions and so the stack trace might 
show the line and column number of the next statement to 
execute. Obtaining a useful stack trace is further hampered by 
the fact that optimization can pull other tricks, including col- 
lapsing entire methods. 


A shortcut to obtaining the essential information for an entire StackTrace is to call 
ToString on it. Here's what the result looks like: 


at DebugTest.Program.C() in C:\Test\Program.cs:line 16 
at DebugTest.Program.B() in C:\Test\Program.cs:line 12 
at DebugTest.Program.A() in C:\Test\Program.cs:line 11 
at DebugTest.Program.Main() in C:\Test\Program.cs:line 10 
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You can also obtain the stack trace for an Exception object (showing what led up to 
the exception being thrown) by passing the Exception into StackTrace’s 
constructor. 


Exception already has a StackTrace property; however, this 
property returns a simple string—not a StackTrace object. A 
StackTrace object is far more useful in logging exceptions 
that occur after deployment—where no .pdb files are available 
—because you can log the IL offset in lieu of line and column 
numbers. With an IL offset and ildasm, you can pinpoint 
where within a method an error occurred. 


Windows Event Logs 


The Win32 platform provides a centralized logging mechanism, in the form of the 
Windows event logs. 


The Debug and Trace classes we used earlier write to a Windows event log if you 
register an EventLogTraceListener. With the EventLog class, however, you can 
write directly to a Windows event log without using Trace or Debug. You can also 
use this class to read and monitor event data. 


Writing to the Windows event log makes sense in a Windows 
Service application, because if something goes wrong, you 
cant pop up a user interface directing the user to some special 
file where diagnostic information has been written. Also, 
because it’s common practice for services to write to the Win- 
dows event log, this is the first place an administrator is likely 
to look if your service falls over. 


There are three standard Windows event logs, identified by these names: 
¢ Application 


e System 


e Security 
The Application log is where most applications normally write. 
Writing to the Event Log 
To write to a Windows event log: 


1. Choose one of the three event logs (usually Application). 


2. Decide on a source name and create it if necessary (create requires administra- 
tive permissions). 


3. Call EventLog.WriteEntry with the log name, source name, and message data. 
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The source name is an easily identifiable name for your application. You must regis- 
ter a source name before you use it—the CreateEventSource method performs this 
function. You can then call WriteEntry: 


const string SourceName = "MyCompany.WidgetServer"; 


// CreateEventSource requires administrative permissions, so this would 
// typically be done in application setup. 
if (!EventLog.SourceExists (SourceName) ) 

EventLog.CreateEventSource (SourceName, "Application"); 


EventLog.WriteEntry (SourceName, 
"Service started; using configuration file=...", 


EventLogEntryType. Information) ; 


EventLogEntryType can be Information, Warning, Error, SuccessAudit, or 
FailureAudit. Each displays with a different icon in the Windows event viewer. You 
can also optionally specify a category and event ID (each is a number of your own 


choosing) and provide optional binary data. 


CreateEventSource also allows you to specify a machine name: this is to write to 
another computer's event log, if you have sufficient permissions. 


Reading the Event Log 
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To read an event log, instantiate the EventLog class with the name of the log that 
you want to access and optionally the name of another computer on which the log 
resides. Each log entry can then be read via the Entries collection property: 


EventLog log = new EventLog ("Application"); 


Console.WriteLine ("Total entries: 


EventLogEntry last = log.Entries 


Console.WriteLine 
Console.WriteLine 
Console.WriteLine 
Console.WriteLine 
Console.WriteLine 


("Index: 
("Source: 
("Type: 
("Time: 
("Message: 


+ 


+ 
+ 
+ 
+ 


+ log.Entries.Count) ; 


[log.Entries.Count - 1]; 
last. Index); 
last.Source); 
lLast.EntryType); 

Last. TimeWritten) ; 
last.Message) ; 


You can enumerate over all logs for the current (or another) computer via the static 
method EventLog.GetEventLogs (this requires administrative privileges for full 


access): 


foreach (EventLog log in EventLog.GetEventLogs()) 
Console.WriteLine (log.LogDisplayName) ; 


This normally prints, at a minimum, Application, Security, and System. 
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Monitoring the Event Log 


You can be alerted whenever an entry is written to a Windows event log, via the 
EntryWritten event. This works for event logs on the local computer, and it fires 
regardless of what application logged the event. 


To enable log monitoring: 


1. Instantiate an EventLog and set its EnableRaisingEvents property to true. 


2. Handle the EntryWritten event. 


For example: 


static void Main() 


{ 


using (var log = new EventLog ("Application") ) 


log.EnableRaisingEvents = true; 
log.EntryWritten += DisplayEntry; 
Console.ReadLine(); 
} 
} 


static void DisplayEntry (object sender, EntryWrittenEventArgs e) 


{ 
EventLogEntry entry = e.Entry; 
Console.WriteLine (entry.Message); 


} 


Performance Counters 


Performance Counters are a Windows-only feature and 
require the NuGet package System.Diagnostics. Perfor 
manceCounter. If youre targeting Linux or macOS, see “Cross- 
Platform Diagnostics Tools” on page 569 for alternatives. 


The logging mechanisms we've discussed to date are useful for capturing informa- 
tion for future analysis. However, to gain insight into the current state of an applica- 
tion (or the system as a whole), a more real-time approach is needed. The Win32 
solution to this need is the performance-monitoring infrastructure, which consists 
of a set of performance counters that the system and applications expose, and the 
Microsoft Management Console (MMC) snap-ins used to monitor these counters in 
real time. 


Performance counters are grouped into categories such as System, Processor, .NET 
CLR Memory, and so on. These categories are sometimes also referred to as perfor- 
mance objects by the GUI tools. Each category groups a related set of performance 
counters that monitor one aspect of the system or application. Examples of perfor- 
mance counters in the .NET CLR Memory category include “% Time in GC, “# 
Bytes in All Heaps,’ and “Allocated bytes/sec.” 
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Each category can optionally have one or more instances that can be monitored 
independently. For example, this is useful in the “% Processor Time” performance 
counter in the Processor category, which allows one to monitor CPU utilization. On 
a multiprocessor machine, this counter supports an instance for each CPU, allowing 
you to monitor the utilization of each CPU independently. 


The following sections illustrate how to perform commonly needed tasks, such as 
determining which counters are exposed, monitoring a counter, and creating your 
own counters to expose application status information. 


Reading performance counters or categories might require 
administrator privileges on the local or target computer, 
depending on what is accessed. 


Enumerating the Available Counters 


The following example enumerates over all of the available performance counters 
on the computer. For those that have instances, it enumerates the counters for each 
instance: 


PerformanceCounterCategory[] cats = 
PerformanceCounterCategory.GetCategories(); 
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foreach (PerformanceCounterCategory cat in cats) 


{ 


Console.WriteLine ("Category: 


+ cat.CategoryName) ; 


string[] instances = cat.GetInstanceNames(); 
if (instances.Length == 0) 
{ 
foreach (PerformanceCounter ctr in cat.GetCounters()) 
Console.WriteLine ("| Counter: " + ctr.CounterName) ; 


} 


else  // Dump counters with instances 


foreach (string instance in instances) 
{ 
Console.WriteLine (" Instance: 
if (cat.InstanceExists (instance) ) 
foreach (PerformanceCounter ctr in cat.GetCounters (instance) ) 
Console.WriteLine (" Counter: " + ctr.CounterName) ; 


+ instance); 


The result is more than 10,000 lines long! It also takes a while 
to execute because PerformanceCounterCategory. Instance 
Exists has an inefficient implementation. In a real system, 
youd want to retrieve the more detailed information only on 
demand. 


The next example uses LINQ to retrieve just .NET performance counters, writing 
the result to an XML file: 





Performance Counters | 565 


var X = 
new XElement ("counters", 
from PerformanceCounterCategory cat in 
Per formanceCounterCategory.GetCategories() 
where cat.CategoryName.StartsWith (".NET") 
let instances = cat.GetInstanceNames() 
select new XElement ("category", 
new XAttribute ("name", cat.CategoryName) , 
instances.Length == 
? 
from c in cat.GetCounters() 
select new XElement ("counter", 
new XAttribute ("name", c.CounterName) ) 


from i in instances 
select new XElement ("instance", new XAttribute ("name", i), 


!cat.InstanceExists (i) 
? 


null 


from c in cat.GetCounters (i) 
select new XElement ("counter", 
new XAttribute ("name", c.CounterName) ) 


) 
); 


x.Save ("counters.xml"); 


Reading Performance Counter Data 


To retrieve the value of a performance counter, instantiate a PerformanceCounter 
object and then call the NextValue or NextSample method. NextValue returns a 
simple float value; NextSample returns a CounterSample object that exposes a 
more advanced set of properties, such as CounterFrequency, TimeStamp, BaseVaLue, 
and RawValLue. 


PerformanceCounter’s constructor takes a category name, counter name, and 
optional instance. So, to display the current processor utilization for all CPUs, you 
would do the following: 


using PerformanceCounter pc = new PerformanceCounter ("Processor", 
"% Processor Time", 
" Total"); 
Console.WriteLine (pc.NextValue()); 


Or to display the “real” (i.e., private) memory consumption of the current process: 


string procName = Process.GetCurrentProcess().ProcessName; 

using PerformanceCounter pc = new PerformanceCounter ("Process", 
"Private Bytes", 
procName) ; 

Console.WriteLine (pc.NextValue()); 
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PerformanceCounter doesn't expose a ValueChanged event, so if you want to moni- 
tor for changes, you must poll. In the next example, we poll every 200 
milliseconds—until signaled to quit by an EventWaitHandle: 


// need to import System.Threading as well as System.Diagnostics 


static void Monitor (string category, string counter, string instance, 
EventWaitHandle stopper) 
{ 
if (!PerformanceCounterCategory.Exists (category) ) 
throw new InvalidOperationException ("Category does not exist"); 


if (!PerformanceCounterCategory.CounterExists (counter, category) ) 
throw new InvalidOperationException ("Counter does not exist"); 


if (instance == null) instance = ""; // "" == no instance (not null!) 
if (instance != "" && 
!PerformanceCounterCategory.InstanceExists (instance, category)) 
throw new InvalidOperationException ("Instance does not exist"); 


float LastValue = Of; 
using (PerformanceCounter pc = new PerformanceCounter (category, 
counter, instance) ) 
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while (!stopper.WaitOne (200, false)) 





float value = pc.NextValue(); 

if (value != lastValue) // Only write out the value 

{ // if it has changed. 
Console.WriteLine (value); 
lastValue = value; 

} 

} 
} 


Here's how we can use this method to simultaneously monitor processor and hard- 
drive activity: 


static void Main() 


{ 


EventWaitHandle stopper = new ManualResetEvent (false); 


new Thread (() => 
Monitor ("Processor", "% Processor Time", "_Total", stopper) 
).Start(); 


new Thread (() => 
Monitor ("LogicalDisk", "% Idle Time", "C:", stopper) 
).Start(); 


Console.WriteLine ("Monitoring - press any key to quit"); 
Console.ReadKey(); 
stopper .Set(); 
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Creating Counters and Writing Performance Data 


Before writing performance counter data, you need to create a performance cate- 
gory and counter. You must create the performance category along with all the 
counters that belong to it in one step, as follows: 


string category = "Nutshell Monitoring"; 


// We'll create two counters in this category: 
string eatenPerMin = "Macadamias eaten so far"; 
string tooHard = "Macadamias deemed too hard"; 


if (!PerformanceCounterCategory.Exists (category) ) 


{ 


CounterCreationDataCollection cd = new CounterCreationDataCollection(); 


cd.Add (new CounterCreationData (eatenPerMin, 
"Number of macadamias consumed, including shelling time", 
PerformanceCounterType.NumberOfItems32)); 


cd.Add (new CounterCreationData (tooHard, 
"Number of macadamias that will not crack, despite much effort", 
PerformanceCounterType.NumberOfItems32)); 


PerformanceCounterCategory.Create (category, "Test Category", 
PerformanceCounterCategoryType.SingleInstance, cd); 


} 


The new counters then show up in the Windows performance-monitoring tool 
when you choose Add Counters. If you later want to define more counters in the 
same category, you must first delete the old category by calling Performance 
CounterCategory.Delete. 


Creating and deleting performance counters requires admin- 
istrative privileges. For this reason, it’s usually done as part of 
the application setup. 


After you create a counter, you can update its value by instantiating a Performance 
Counter, setting ReadOnly to false, and setting RawValue. You can also use the 
Increment and IncrementBy methods to update the existing value: 


string category = "Nutshell Monitoring"; 
string eatenPerMin = "Macadamias eaten so far"; 


using (PerformanceCounter pc = new PerformanceCounter (category, 
eatenPerMin, "")) 

{ 

pc.ReadOnly = false; 

pc.RawValue = 1000; 

pc.Increment(); 

pc.IncrementBy (10); 

Console.WriteLine (pc.NextValue()); // 1011 
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The Stopwatch Class 


The Stopwatch class provides a convenient mechanism for measuring execution 
times. Stopwatch uses the highest-resolution mechanism that the OS and hardware 
provide, which is typically less than a microsecond. (In contrast, DateTime .Now and 
Environment. TickCount have a resolution of about 15 ms.) 


To use Stopwatch, call StartNew—this instantiates a Stopwatch and starts it ticking. 
(Alternatively, you can instantiate it manually and then call Start.) The Elapsed 
property returns the elapsed interval as a TimeSpan: 


Stopwatch s = Stopwatch.StartNew(); 
System.1I0.File.WriteAllText ("test.txt", new string ('*', 30000000)); 
Console.WriteLine (s.Elapsed); // 00:00:01.4322661 


Stopwatch also exposes an ElapsedTicks property, which returns the number of 
elapsed “ticks” as a long. To convert from ticks to seconds, divide by 
StopWatch.Frequency. There’s also an ElapsedMilliseconds property, which is 
often the most convenient. 


Calling Stop freezes Elapsed and ElapsedTicks. There’s no background activity 
incurred by a “running” Stopwatch, so calling Stop is optional. 
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Cross-Platform Diagnostics Tools 


In this section, we briefly describe the cross-platform diagnostic tools available 
to .NET Core: 


dotnet-counters 
Provides an overview of the state of a running application 


dotnet-trace 
For more detailed performance and event monitoring 


dotnet-dump 
To obtain a memory dump on demand or after a crash 


These tools do not require administrative elevation and are suitable for both devel- 
opment and production environments. 


dotnet-counters 


The dotnet-counters tool monitors the memory and CPU usage of a .NET Core pro- 
cess and writes the data to the console (or a file). 


To install the tool, run the following from a command prompt or terminal with dot- 
net in the path: 


dotnet tool install --global dotnet-counters 


You can then start monitoring a process, as follows: 
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dotnet-counters monitor System.Runtime --process-id <<ProcessID>> 


System.Runtime means that we want to monitor all counters under the System.Run- 
time category. You can specify either a category or counter name (the dotnet- 
counters list command lists all available categories and counters). 


The output is continually refreshed, and looks like this: 


Press p to pause, r to resume, q to quit. 
Status: Running 


[System.Runtime ] 


# of Assemblies Loaded 63 
% Time in GC (since Last GC) 0 
Allocation Rate (Bytes / sec) 244,864 
CPU Usage (%) 6 
Exceptions / sec 0 
GC Heap Size (MB) 8 
Gen 0 GC / sec 0 
Gen © Size (B) 265,176 
Gen 1 GC / sec 0 
Gen 1 Size (B) 451,552 
Gen 2 GC / sec 0 
Gen 2 Size (B) 24 
LOH Size (B) 3,200,296 
Monitor Lock Contention Count / sec 0 
Number of Active Timers 0 
ThreadPool Completed Work Items / sec 15 
ThreadPool Queue Length 0 
ThreadPool Threads Count 9 
Working Set (MB) 52 


Here are all available commands: 


Commands Purpose 


list Display a list of counter names along with a description of each 
ps Display a list of dotnet processes eligible for monitoring 
monitor _ Display values of selected counters (periodically refreshed) 


collect Saves counter information to a file 





The following parameters are supported: 


Options/arguments Purpose 


--version Display the version of dotnet-counters 

-h, --help Display help about the program 

-p, --process-id ID of dotnet process to monitor. Applies to the monitor and collect 
commands. 


--refresh-interval Sets the desired refresh interval in seconds. Applies to the monitor and 
collect commands. 
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Options/arguments Purpose 


-o, --output Sets the output file name. Applies to the collect command. 


-- format Sets the output format. Valid are csv or json. Applies to the collect command. 





dotnet-trace 


Traces are timestamped records of events in your program, such as a method being 
called or a database being queried. Traces can also include performance metrics and 
custom events, and can contain local context such as the value of local variables. 
Traditionally, NET Framework and frameworks such as ASP.NET used ETW. 
In .NET Core, application traces are written to ETW when running on Windows 
and LTTng on Linux. 


To install the tool, run the following command: 
dotnet tool install --global dotnet-trace 
To start recording a programs events, run the following command: 


dotnet-trace collect --process-id <<ProcessId>> 


This runs dotnet-trace with the default profile, which collects CPU and .NET run- 
time events, and writes to a file called trace.nettrace. You can specify other profiles 
with the --profile switch: gc-verbose tracks garbage collection and sampled object 
allocation, and gc-collect tracks garbage collection with a low overhead. The -o 
switch lets you specify a different output filename. 


The default output is a .netperf file, which can be analyzed directly on a Windows 
machine with the PerfView tool. Alternatively, you can instruct dotnet-trace to cre- 
ate a file compatible with Speedscope, which is a free online analysis service. To cre- 
ate a Speedscope (.speedscope.json) file, use the option --format speedscope. 


You can download the latest version of PerfView on GitHub. 
The version that ships with Windows 10 might not sup- 
port .netperf files. 


The following commands are supported: 


Commands Purpose 
collect Starts recording counter information to a file. 
ps Displays a list of dotnet processes eligible for monitoring. 


list-profiles Lists prebuilt tracing profiles with a description of providers and filters in each. 


convert <file> Converts from the nettrace (.netperf) format to an alternative format. Currently, 
speedscope is the only target option. 
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Custom trace events 
Your app can emit custom events by defining a custom EventSource: 


[EventSource (Name = "MyTestSource" ) ] 
public sealed class MyEventSource : EventSource 


{ 


public static MyEventSource Instance = new MyEventSource (); 


MyEventSource() : base (EventSourceSettings.EtwSelfDescribingEventFormat) 


{ 
} 
public void Log (string message, int someNumber ) 
{ 
WriteEvent (1, message, someNumber ); 
} 
} 


The WriteEvent method is overloaded to accept various combination of simple 
types (primarily strings and integers). You can then call it as follows: 


MyEventSource.Instance.Log ("Something", 123); 


When calling dotnet-trace, you must specify the name(s) of any custom event sour- 
ces that want to record: 


dotnet-trace collect --process-id <<ProcessId>> --providers MyTestSource 


dotnet-dump 


A dump, sometimes called a core dump, is a snapshot of the state of a process's vir- 
tual memory. You can dump a running process on demand, or configure the OS to 
generate a dump when an application crashes. 


On Ubuntu Linux, the following command enables a core dump upon application 
crash (the necessary steps can vary between different flavors of Linux): 


ulimit -c unlimited 


On Windows, use regedit.exe to create or edit the following key in the local machine 
hive: 


SOFTWARE\Microsoft\Windows\Windows Error Reporting\LocalDumps 
Under that, add a key with the same name as your executable (e.g., foo.exe), and 
under that key, add the following keys: 
e DumpFolder (REG_EXPAND_SZ), with a value indicating the path to which 
you want dump files written 
e DumpType (REG_DWORD), with a value of 2 to request a full dump 


e (Optionally) DumpCount (REG_DWORD), indicating the maximum number of 
dump files before the oldest is removed 
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To install the tool, run the following command: 


dotnet tool install --global dotnet-dump 


After you've installed it, you can initiate a dump on demand (without ending the 
process), as follows: 


dotnet-dump collect --process-id <<YourProcessId>> 


The following command starts an interactive shell for analyzing a dump file: 


dotnet-dump analyze <<dumpfile>> 


If an exception took down the application, you can use the printexceptions com- 
mand (pe for short) to display details of that exception. The dotnet-dump shell sup- 
ports numerous additional commands, which you can list with the help command. 
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14 


Concurrency and Asynchrony 








Most applications need to deal with more than one thing happening at a time (con- 
currency). In this chapter, we start with the essential prerequisites, namely the basics 
of threading and tasks, and then describe in detail the principles of asynchrony and 
C#’s asynchronous functions. 


In Chapter 22, we revisit multithreading in greater detail, and in Chapter 23, we 
cover the related topic of parallel programming. 


Introduction 


Following are the most common concurrency scenarios: 


Writing a responsive user interface 
In WPE, mobile, and Windows Forms applications, you must run time- 
consuming tasks concurrently with the code that runs your user interface to 
maintain responsiveness. 


Allowing requests to process simultaneously 
On a server, client requests can arrive concurrently and so must be handled in 
parallel to maintain scalability. If you use ASP.NET Core or Web API, .NET 
Core does this for you automatically. However, you still need to be aware of 
shared state (for instance, the effect of using static variables for caching). 


Parallel programming 
Code that performs intensive calculations can execute faster on multicore/ 
multiprocessor computers if the workload is divided between cores (Chapter 23 
is dedicated to this). 


Speculative execution 
On multicore machines, you can sometimes improve performance by predict- 
ing something that might need to be done and then doing it ahead of time. 
LINQPad uses this technique to speed up the creation of new queries. A 
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variation is to run a number of different algorithms in parallel that all solve the 
same task. Whichever one finishes first “wins’—this is effective when you can't 
know ahead of time which algorithm will execute fastest. 


The general mechanism by which a program can simultaneously execute code is 
called multithreading. Multithreading is supported by both the CLR and operating 
system and is a fundamental concept in concurrency. Understanding the basics of 
threading, and in particular, the effects of threads on shared state, is therefore 
essential. 


Threading 


A thread is an execution path that can proceed independently of others. 


Each thread runs within an operating system process, which provides an isolated 
environment in which a program runs. With a single-threaded program, just one 
thread runs in the process’s isolated environment and so that thread has exclusive 
access to it. With a multithreaded program, multiple threads run in a single process, 
sharing the same execution environment (memory, in particular). This, in part, is 
why multithreading is useful: one thread can fetch data in the background, for 
instance, while another thread displays the data as it arrives. This data is referred to 
as shared state. 


Creating a Thread 


A client program (Console, WPE, UWP, or Windows Forms) starts in a single thread 
that’s created automatically by the OS (the “main” thread). Here it lives out its life as 
a single-threaded application, unless you do otherwise, by creating more threads 
(directly or indirectly).' 


You can create and start a new thread by instantiating a Thread object and calling its 
Start method. The simplest constructor for Thread takes a ThreadStart delegate: a 
parameterless method indicating where execution should begin. Here's an example: 


// NB: ALL samples in this chapter assume the following namespace imports: 


using System; 
using System. Threading; 


class ThreadTest 


{ 
static void Main() 
{ 
Thread t = new Thread (WriteY); // Kick off a new thread 
t.Start(); // running WriteY() 


// Simultaneously, do something on the main thread. 
for (int i = 0; i < 1000; i++) Console.Write ("x"); 





1 The CLR creates other threads behind the scenes for garbage collection and finalization. 
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} 


static void WriteY() 


{ 
for (int i = 0; i < 1000; i++) Console.Write ("y"); 


ii 
} 


// Typical Output: 


XXXXXXXXXAXXXXXXXVVVVYVYYYVYYYYYYYYYYYYYYYYYYYYYY YY yyy 
XXXXXXXXXXXXXXXXXKXKXXXXKXXXXXXXXXXXXXXXXXVVVVVVYYYVVVY 


YYYYYYYYYYYYYYYYYYYYYYYYYY YY VY YXXXXXKXXKAXAXAKX AAR KKK 


XXAXXAXXXXXAXXXXXXXXXAXXXXYVVVVVYYYYYYYYYYYYYYYYYYYYYY YY yy 
YYYVYVVVYYYYYYVYXXXXXXXXXXXXXXXXXXKXKXXXXXXXKXKXXXXXKXXKKK XX 


The main thread creates a new thread t on which it runs a method that repeatedly 
prints the character y. Simultaneously, the main thread repeatedly prints the charac- 
ter x, as shown in Figure 14-1. On a single-core computer, the operating system 
must allocate “slices” of time to each thread (typically 20 ms in Windows) to simu- 
late concurrency, resulting in repeated blocks of x and y. On a multicore or multi- 
processor machine, the two threads can genuinely execute in parallel (subject to 
competition by other active processes on the computer), although you still get 
repeated blocks of x and y in this example because of subtleties in the mechanism by 
which Console handles concurrent requests. 
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Figure 14-1. Starting a new thread 


A thread is said to be preempted at the points at which its exe- 
cution is interspersed with the execution of code on another 
thread. The term often crops up in explaining why something 
has gone wrong! 


After it’s started, a thread’s IsAlive property returns true, until the point at which 
the thread ends. A thread ends when the delegate passed to the Thread’s constructor 
finishes executing. After it’s ended, a thread cannot restart. 


Each thread has a Name property that you can set for the benefit of debugging. This 
is particularly useful in Visual Studio because the thread’s name is displayed in the 
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Threads Window and Debug Location toolbar. You can set a thread’s name just 
once; attempts to change it later will throw an exception. 


The static Thread.CurrentThread property gives you the currently executing 
thread: 


Console.WriteLine (Thread.CurrentThread.Name) ; 


Join and Sleep 
You can wait for another thread to end by calling its Join method: 


static void Main() 


{ 
Thread t = new Thread (Go); 
t.Start(); 
t.Join(); 
Console.WriteLine ("Thread t has ended!"); 


} 


static void Go() { for (int i = 0; i < 1000; i++) Console.Write ("y"); } 


«>> 


This prints “y” 1,000 times, followed by “Thread t has ended!” immediately after- 
ward. You can include a timeout when calling Join, either in milliseconds or as a 
TimeSpan. It then returns true if the thread ended or false if it timed out. 


Thread. Sleep pauses the current thread for a specified period: 


Thread.Sleep (TimeSpan.FromHours (1)); // Sleep for 1 hour 
Thread.Sleep (500); // Sleep for 500 milliseconds 


Thread.Sleep(0) relinquishes the thread’s current time slice immediately, voluntar- 
ily handing over the CPU to other threads. Thread. Yield() does the same thing 
except that it relinquishes only to threads running on the same processor. 


Sleep(@) or Yield is occasionally useful in production code 
for advanced performance tweaks. It’s also an excellent diag- 
nostic tool for helping to uncover thread safety issues: if 
inserting Thread.Yield() anywhere in your code breaks the 
program, you almost certainly have a bug. 


While waiting on a Sleep or Join, a thread is blocked. 


Blocking 


A thread is deemed blocked when its execution is paused for some reason, such as 
when Sleeping or waiting for another to end via Join. A blocked thread immedi- 
ately yields its processor time slice, and from then on it consumes no processor time 
until its blocking condition is satisfied. You can test for a thread being blocked via 
its ThreadState property: 


bool blocked = (someThread.ThreadState & ThreadState.WaitSleepJoin) != 0; 
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ThreadState is a flags enum, combining three “layers” of data 
in a bitwise fashion. Most values, however, are redundant, 
unused, or deprecated. The following extension method strips 
a ThreadState to one of four useful values: Unstarted, 
Running, WaitSleepJoin, and Stopped: 
public static ThreadState Simplify (this ThreadState ts) 
{ 
return ts & (ThreadState.Unstarted | 
ThreadState.WaitSleepJoin | 
ThreadState. Stopped) ; 
} 
The ThreadState property is useful for diagnostic purposes 
but unsuitable for synchronization, because a thread’s state 
can change in between testing ThreadState and acting on that 
information. 


When a thread blocks or unblocks, the OS performs a context switch. This incurs a 
small overhead, typically one or two microseconds. 


1/0-bound versus compute-bound 


An operation that spends most of its time waiting for something to happen is called 
I/O-bound—an example is downloading a web page or calling Console.ReadLine. 
(I/O-bound operations typically involve input or output, but this is not a hard 
requirement: Thread.Sleep is also deemed I/O-bound.) In contrast, an operation 
that spends most of its time performing CPU-intensive work is called 
compute-bound. 


Blocking versus spinning 


An I/O-bound operation works in one of two ways: it either waits synchronously on 
the current thread until the operation is complete (such as Console.ReadLine, 
Thread.Sleep, or Thread. Join), or operates asynchronously, firing a callback when 
the operation finishes some time thereafter (more on this later). 


I/O-bound operations that wait synchronously spend most of their time blocking a 
thread. They can also “spin” in a loop periodically: 


while (DateTime.Now < nextStartTime) 
Thread.Sleep (100); 


Leaving aside that there are better ways to do this (such as timers or signaling con- 
structs), another option is that a thread can spin continuously: 


while (DateTime.Now < nextStartTime); 


In general, this is very wasteful on processor time: as far as the CLR and OS are con- 
cerned, the thread is performing an important calculation and thus is allocated 
resources accordingly. In effect, we've turned what should be an I/O-bound opera- 
tion into a compute-bound operation. 
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There are a couple of nuances with regard to spinning versus 
blocking. First, spinning very briefly can be effective when you 
expect a condition to be satisfied soon (perhaps within a few 
microseconds) because it avoids the overhead and latency of a 
context switch. .NET Core provides special methods and 
classes to assist—see “SpinLock and SpinWait”. 


Second, blocking does not incur a zero cost. This is because 
each thread ties up around 1 MB of memory for as long as it 
lives and causes an ongoing administrative overhead for the 
CLR and OS. For this reason, blocking can be troublesome in 
the context of heavily I/O-bound programs that need to han- 
dle hundreds or thousands of concurrent operations. Instead, 
such programs need to use a callback-based approach, 
rescinding their thread entirely while waiting. This is (in part) 
the purpose of the asynchronous patterns that we discuss later. 


Local versus Shared State 


The CLR assigns each thread its own memory stack so that local variables are kept 
separate. In the next example, we define a method with a local variable and then call 
the method simultaneously on the main thread and a newly created thread: 


static void Main() 


{ 
new Thread (Go).Start(); // Call Go() on a new thread 
Go(); // Call Go() on the main thread 
} 
static void Go() 
{ 
// Declare and use a local variable - 'cycles' 


for (int cycles = 0; cycles < 5; cycles++) Console.Write ('?'); 


} 


A separate copy of the cycles variable is created on each thread’s memory stack, 
and so the output is, predictably, 10 question marks. 


Threads share data if they have a common reference to the same object instance: 


class ThreadTest 
bool _done; 


static void Main() 


{ 


ThreadTest tt = new ThreadTest(); // Create a common instance 
new Thread (tt.Go).Start(); 
tt.Go(); 

} 


void Go() // Note that this is an instance method 


{ 


if (!_done) { _done = true; Console.WriteLine ("Done"); } 
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i 
} 


Because both threads call Go() on the same ThreadTest instance, they share the 
_done field. This results in “Done” being printed once instead of twice. 


Local variables captured by a lambda expression or anonymous delegate are con- 
verted by the compiler into fields and so can also be shared: 


static void Main() 


{ 


bool done = false; 
ThreadStart action = () => 


if (!done) { done = true; Console.WriteLine ("Done"); } 
33 
new Thread (action).Start(); 
action(); 


} 


Static fields offer another way to share data between threads: 


class ThreadTest 


{ 
static bool _done; // Static fields are shared between all threads 
// in the same application domain. 
static void Main() 


new Thread (Go).Start(); 
Go(); 
} 


static void Go() 


if (!_done) { _done = true; Console.WriteLine ("Done"); } 
} 
} 


All three examples illustrate another key concept: that of thread safety (or rather, 
lack of it!). The output is actually indeterminate: it’s possible (though unlikely) that 
“Done” could be printed twice. If, however, we swap the order of statements in the 
Go method, the odds of “Done” being printed twice go up dramatically: 


static void Go() 


{ 


if (!_done) { Console.WriteLine ("Done"); _done = true; } 


} 


The problem is that one thread can be evaluating the if statement at exactly the 
same time as the other thread is executing the WriteLine statement—before it’s had 
a chance to set done to true. 
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Our example illustrates one of many ways that shared writable 
state can introduce the kind of intermittent errors for which 
multithreading is notorious. Next, we look at how to fix our 
program by locking; however, it’s better to avoid shared state 
altogether where possible. We see later how asynchronous 
programming patterns help with this. 


Locking and Thread Safety 


Locking and thread safety are large topics. For a full discus- 
sion, see “Exclusive Locking” on page 882 and “Locking and 
Thread Safety” on page 890 in Chapter 22. 


We can fix the previous example by obtaining an exclusive lock while reading and 
writing to the shared field. C# provides the Lock statement for just this purpose: 


class ThreadSafe 


{ 
static bool _done; 
static readonly object _locker = new object(); 


static void Main() 


new Thread (Go).Start(); 
Go(); 
} 


static void Go() 


lock (_locker) 
{ 


if (!_done) { Console.WriteLine ("Done"); _done = true; } 
} 
} 
} 


When two threads simultaneously contend a lock (which can be upon any 
reference-type object; in this case, _locker), one thread waits, or blocks, until the 
lock becomes available. In this case, it ensures that only one thread can enter its 
code block at a time, and “Done” will be printed just once. Code that’s protected in 
such a manner—from indeterminacy in a multithreaded context—is called 
thread-safe. 


Even the act of autoincrementing a variable is not thread-safe: 
the expression x++ executes on the underlying processor as 
distinct read-increment-write operations. So, if two threads 
execute x++ at once outside a lock, the variable can end up get- 
ting incremented once rather than twice (or worse, x could be 
torn, ending up with a bitwise mixture of old and new content, 
under certain conditions). 
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Locking is not a silver bullet for thread safety—it’s easy to forget to lock around 
accessing a field, and locking can create problems of its own (such as deadlocking). 


A good example of when you might use locking is around accessing a shared in- 
memory cache for frequently accessed database objects in an ASP.NET application. 
This kind of application is simple to get right, and there’s no chance of deadlocking. 
We give an example in “Thread Safety in Application Servers” on page 893. 


Passing Data to a Thread 


Sometimes, you'll want to pass arguments to the thread’s startup method. The easi- 
est way to do this is with a lambda expression that calls the method with the desired 
arguments: 


static void Main() 


{ 
Thread t = new Thread ( () => Print ("Hello from t!") ); 
t.Start(); 


} 


static void Print (string message) { Console.WriteLine (message); } 


With this approach, you can pass in any number of arguments to the method. You 
can even wrap the entire implementation in a multistatement lambda: 


new Thread (() => 

{ 
Console.WriteLine ("I'm running on another thread!"); 
Console.WriteLine ("This is so easy!"); 

}).StartQ); 
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An alternative (and less flexible) technique is to pass an argument into Thread's 
Start method: 


static void Main() 


{ 
Thread t = new Thread (Print); 
t.Start ("Hello from t!"); 


} 


static void Print (object message0bj) 


{ 
string message = (string) message0bj; // We need to cast here 
Console.WriteLine (message); 


} 


This works because Thread’s constructor is overloaded to accept either of two 
delegates: 


public delegate void ThreadStart(); 
public delegate void ParameterizedThreadStart (object obj); 
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Lambda expressions and captured variables 


As we saw, a lambda expression is the most convenient and powerful way to pass 
data to a thread. However, you must be careful about accidentally modifying cap- 
tured variables after starting the thread. For instance, consider the following: 


for (int i = 0; i < 10; i++) 
new Thread (() => Console.Write (i)).Start(); 


The output is nondeterministic! Here’s a typical result: 
0223557799 


The problem is that the i variable refers to the same memory location throughout 
the loop’s lifetime. Therefore, each thread calls Console.Write on a variable whose 
value can change as it is running! The solution is to use a temporary variable as 
follows: 


for (int i = 0; i < 10; i++) 
{ 

int temp = i; 

new Thread (() => Console.Write (temp)).Start(); 
} 


Each of the digits 0 to 9 is then written exactly once. (The ordering is still undefined 
because threads can start at indeterminate times.) 


This is analogous to the problem we described in “Captured 
Variables” on page 384 in Chapter 8. The problem is just as 
much about C#’s rules for capturing variables in for loops as it 
is about multithreading. 


Variable temp is now local to each loop iteration. Therefore, each thread captures a 
different memory location and there’s no problem. We can illustrate the problem in 
the earlier code more simply with the following example: 


string text = "t1"; 
Thread t1 = new Thread ( () => Console.WriteLine (text) ); 


text = “e2"s 
Thread t2 = new Thread ( () => Console.WriteLine (text) ); 


ti, Start(); t2.Start¢); 


Because both lambda expressions capture the same text variable, t2 is printed twice. 


Exception Handling 


Any try/catch/finally blocks in effect when a thread is created are of no relevance 
to the thread when it starts executing. Consider the following program: 


public static void Main() 


{ 
try 


{ 
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new Thread (Go).Start(); 
} 


catch (Exception ex) 
1 
// We'll never get here! 
Console.WriteLine ("Exception!"); 
} 
} 


static void Go() { throw null; } // Throws a NullReferenceException 


The try/catch statement in this example is ineffective, and the newly created thread 
will be encumbered with an unhandled NullReferenceException. This behavior 
makes sense when you consider that each thread has an independent execution 
path. 


The remedy is to move the exception handler into the Go method: 


public static void Main() 


{ 
new Thread (Go).Start(); 
} 
static void Go() 
{ 
try 
{ 
throw null; // The NullReferenceException will get caught below 
} 
catch (Exception ex) 
{ 
// Typically log the exception, and/or signal another thread 
// that we've come unstuck 
z 
} 


You need an exception handler on all thread entry methods in production applica- 
tions—just as you do (usually at a higher level, in the execution stack) on your main 
thread. An unhandled exception causes the whole application to shut down—with 
an ugly dialog box! 


In writing such exception handling blocks, rarely would you 
ignore the error: typically, youd log the details of the excep- 
tion. For a client application you might display a dialog box 
allowing the user to automatically submit those details to your 
web server. You then might choose to restart the application, 
because it’s possible that an unexpected exception might leave 
your program in an invalid state. 
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Centralized exception handling 


In WPF, UWP, and Windows Forms applications, you can subscribe to global 
exception handling events, Application.DispatcherUnhandledException, and 
Application. ThreadException, respectively. These fire after an unhandled excep- 
tion in any part of your program that’s called via the message loop (this amounts to 
all code that runs on the main thread while the Application is active). This is useful 
as a backstop for logging and reporting bugs (although it won't fire for unhandled 
exceptions on non-UI threads that you create). Handling these events prevents the 
program from shutting down, although you may choose to restart the application to 
avoid the potential corruption of state that can follow from (or that led to) the 
unhandled exception. 


Foreground versus Background Threads 


By default, threads you create explicitly are foreground threads. Foreground threads 
keep the application alive for as long as any one of them is running, whereas back- 
ground threads do not. After all foreground threads finish, the application ends, and 
any background threads still running abruptly terminate. 


A thread’s foreground/background status has no relation to its 
priority (allocation of execution time). 


You can query or change a thread’s background status using its IsBackground 
property: 


static void Main (string[] args) 


{ 
Thread worker = new Thread ( () => Console.ReadLine() ); 
if (args.Length > ©) worker.IsBackground = true; 
worker .Start(); 


} 


If this program is called with no arguments, the worker thread assumes foreground 
status and will wait on the ReadLine statement for the user to press Enter. Mean- 
while, the main thread exits, but the application keeps running because a fore- 
ground thread is still alive. On the other hand, if an argument is passed to Main(), 
the worker is assigned background status, and the program exits almost immedi- 
ately as the main thread ends (terminating the ReadLine). 


When a process terminates in this manner, any finally blocks in the execution 
stack of background threads are circumvented. If your program employs finally 
(or using) blocks to perform cleanup work such as deleting temporary files, you can 
avoid this by explicitly waiting out such background threads upon exiting an appli- 
cation, either by joining the thread, or with a signaling construct (see “Signaling” on 
page 587). In either case, you should specify a timeout, so you can abandon a rene- 
gade thread should it refuse to finish, otherwise your application will fail to close 
without the user having to enlist help from the Task Manager (or on Unix, the kill 
command). 
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Foreground threads don't require this treatment, but you must take care to avoid 
bugs that could cause the thread not to end. A common cause for applications fail- 
ing to exit properly is the presence of active foreground threads. 


Thread Priority 


A thread’s Priority property determines how much execution time it is allotted rel- 
ative to other active threads in the OS, on the following scale: 


enum ThreadPriority { Lowest, BelowNormal, Normal, AboveNormal, Highest } 


This becomes relevant when multiple threads are simultaneously active. You need to 
take care when elevating a thread's priority because it can starve other threads. If 
you want a thread to have higher priority than threads in other processes, you must 
also elevate the process priority using the Process class in System.Diagnostics: 


using Process p Process.GetCurrentProcess(); 
p.PriorityClass = ProcessPriorityClass.High; 


This can work well for non-UI processes that do minimal work and need low 
latency (the ability to respond very quickly) in the work they do. With compute- 
hungry applications (particularly those with a user interface), elevating process pri- 
ority can starve other processes, slowing down the entire computer. 


Signaling 


Sometimes, you need a thread to wait until receiving notification(s) from other 
thread(s). This is called signaling. The simplest signaling construct is ManualReset 
Event. Calling WaitOne on a ManualResetEvent blocks the current thread until 
another thread “opens” the signal by calling Set. In the following example, we start 
up a thread that waits on a ManualResetEvent. It remains blocked for two seconds 
until the main thread signals it: 


var signal = new ManualResetEvent (false); 


new Thread (() => 
{ 


Console.WriteLine ("Waiting for signal..."); 

signal.WaitOne(); 

signal.Dispose(); 

Console.WriteLine ("Got signal!"); 
}).Start(); 


Thread.Sleep(2000) ; 
signal.Set(); // "Open" the signal 


After calling Set, the signal remains open; you can close it again by calling Reset. 


ManualResetEvent is one of several signaling constructs provided by the CLR; we 
cover all of them in detail in Chapter 22. 
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Threading in Rich Client Applications 


In WPE, UWP, and Windows Forms applications, executing long-running opera- 
tions on the main thread makes the application unresponsive because the main 
thread also processes the message loop that performs rendering and handles key- 
board and mouse events. 


A popular approach is to start up “worker” threads for time-consuming operations. 
The code on a worker thread runs a time-consuming operation and then updates 
the UI when complete. However, all rich client applications have a threading model 
whereby UI elements and controls can be accessed only from the thread that created 
them (typically the main UI thread). Violating this causes either unpredictable 
behavior, or an exception to be thrown. 


Hence when you want to update the UI from a worker thread, you must forward the 
request to the UI thread (the technical term is marshal). The low-level way to do 
this is as follows (later, we discuss other solutions which build on these): 


e In WPE call BeginInvoke or Invoke on the element’s Dispatcher object. 
e In UWP apps, call RunAsync or Invoke on the Dispatcher object. 


e In Windows Forms, call BeginInvoke or Invoke on the control. 


All of these methods accept a delegate referencing the method you want to run. 
BeginInvoke/RunAsync work by enqueuing the delegate to the UI thread’s message 
queue (the same queue that handles keyboard, mouse, and timer events). Invoke 
does the same thing, but then blocks until the message has been read and processed 
by the UI thread. Because of this, Invoke lets you get a return value back from the 
method. If you don’t need a return value, BeginInvoke/RunAsync are preferable in 
that they don't block the caller and don't introduce the possibility of deadlock (see 
“Deadlocks” on page 888 in Chapter 22). 


You can imagine that when you call Application. Run, the fol- 
lowing pseudo-code executes: 


while (!thisApplication. Ended) 
{ 
wait for something to appear in message queue 
Got something: what kind of message is it? 
Keyboard/mouse message -> fire an event handler 
User BeginInvoke message -> execute delegate 
User Invoke message -> execute delegate & post result 
} 
It’s this kind of loop that enables a worker thread to marshal a 


delegate for execution onto the UI thread. 


To demonstrate, suppose that we have a WPF window that contains a text box called 
txtMessage, whose content we want a worker thread to update after performing a 
time-consuming task (which we will simulate by calling Thread. Sleep). Here's how 
wed do it: 
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partial class MyWindow : Window 


{ 
public MyWindow( ) 


{ 
InitializeComponent(); 
new Thread (Work).Start(); 


} 


void Work() 


{ 
Thread.Sleep (5000); // Simulate time-consuming task 
UpdateMessage ("The answer"); 


} 


void UpdateMessage (string message) 


{ 
Action action = () => txtMessage.Text = message; 
Dispatcher .BeginInvoke (action); 
} 
} 


Running this results in a responsive window appearing immediately. Five seconds 
later, it updates the textbox. The code is similar for Windows Forms, except that we 
call the (Forms) BeginInvoke method, instead: 


void UpdateMessage (string message) 


{ 
Action action = () => txtMessage.Text = message; 
this.BeginInvoke (action); 


} 
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Multiple UI Threads 


It's possible to have multiple UI threads if they each own different windows. The 
main scenario is when you have an application with multiple top-level windows, 
often called a Single Document Interface (SDI) application, such as Microsoft Word. 
Each SDI window typically shows itself as a separate “application” on the taskbar 
and is mostly isolated, functionally, from other SDI windows. By giving each such 
window its own UI thread, each window can be made more responsive with respect 
to the others. 











Synchronization Contexts 


In the System.ComponentModel namespace, there’s a class called Synchronization 
Context, which enables the generalization of thread marshaling. 


The rich-client APIs for mobile and desktop (UWP, WPF, and Windows Forms) 
each define and instantiate SynchronizationContext subclasses, which you can 
obtain via the static property SynchronizationContext.Current (while running on 


a UI thread). Capturing this property let you later post to UI controls from a worker 
thread: 
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partial class MyWindow : Window 


{ 


SynchronizationContext _uiSyncContext; 


public MyWindow( ) 
{ 
InitializeComponent(); 
// Capture the synchronization context for the current UI thread: 
_uiSyncContext = SynchronizationContext.Current; 
new Thread (Work).Start(); 


i 

void Work() 

{ 
Thread.Sleep (5000); // Simulate time-consuming task 
UpdateMessage ("The answer"); 

t 


void UpdateMessage (string message) 


{ 
// Marshal the delegate to the UI thread: 
_uiSyncContext.Post (_ => txtMessage.Text = message, null); 
} 
} 


This is useful because the same technique works with all rich-client User Interface 
APIs. 


Calling Post is equivalent to calling BeginInvoke on a Dispatcher or Control; 
there's also a Send method which is equivalent to Invoke. 


The Thread Pool 


Whenever you start a thread, a few hundred microseconds are spent organizing 
such things as a fresh local variable stack. The thread pool cuts this overhead by hav- 
ing a pool of pre-created recyclable threads. Thread pooling is essential for efficient 
parallel programming and fine-grained concurrency; it allows short operations to 
run without being overwhelmed with the overhead of thread startup. 


There are a few things to be wary of when using pooled threads: 
e You cannot set the Name of a pooled thread, making debugging more difficult 


(although you can attach a description when debugging in Visual Studios 
Threads window). 


¢ Pooled threads are always background threads. 
¢ Blocking pooled threads can degrade performance (see “Hygiene in the thread 


pool” on page 591). 


You are free to change the priority of a pooled thread—it will be restored to normal 
when released back to the pool. 
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You can determine whether you're currently executing on a pooled thread via the 
property Thread. CurrentThread.IsThreadPoolThread. 


Entering the thread pool 


The easiest way to explicitly run something on a pooled thread is to use Task.Run 
(we cover this in more detail in the following section): 


// Task is in System. Threading. Tasks 
Task.Run (() => Console.WriteLine ("Hello from the thread pool")); 


Because tasks didn’t exist prior to .NET Framework 4.0, a common alternative is to 
call ThreadPool.QueueUserWorkIten: 


ThreadPool.QueueUserWorkItem (notUsed => Console.WriteLine ("Hello")); 


The following use the thread pool implicitly: 


e ASP.NET Core and Web API application servers 
e System.Timers.Timer and System. Threading. Timer 


¢ The parallel programming constructs that we describe in 
Chapter 23 


e The (legacy) BackgroundWorker class 


Hygiene in the thread pool 


The thread pool serves another function, which is to ensure that a temporary excess 
of compute-bound work does not cause CPU oversubscription. Oversubscription is 
the condition of there being more active threads than CPU cores, with the OS hav- 
ing to time-slice threads. Oversubscription hurts performance because time-slicing 
requires expensive context switches and can invalidate the CPU caches that have 
become essential in delivering performance to modern processors. 


The CLR avoids oversubscription in the thread pool by queuing tasks and throttling 
their startup. It begins by running as many concurrent tasks as there are hardware 
cores, and then tunes the level of concurrency via a hill-climbing algorithm, contin- 
ually adjusting the workload in a particular direction. If throughput improves, it 
continues in the same direction (otherwise it reverses). This ensures that it always 
tracks the optimal performance curve—even in the face of competing process activ- 
ity on the computer. 


The CLR’s strategy works best if two conditions are met: 


¢ Work items are mostly short-running (<250 ms, or ideally <100 ms), so that the 
CLR has plenty of opportunities to measure and adjust. 


e Jobs that spend most of their time blocked do not dominate the pool. 
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Blocking is troublesome because it gives the CLR the false idea that it’s loading up 
the CPU. The CLR is smart enough to detect and compensate (by injecting more 
threads into the pool), although this can make the pool vulnerable to subsequent 
oversubscription. It also can introduce latency because the CLR throttles the rate at 
which it injects new threads, particularly early in an application's life (more so on 
client operating systems where it favors lower resource consumption). 


Maintaining good hygiene in the thread pool is particularly relevant when you want 
to fully utilize the CPU (e.g., via the parallel programming APIs in Chapter 23). 


Tasks 


A thread is a low-level tool for creating concurrency, and as such, it has limitations. 
In particular: 


e Although it’s easy to pass data into a thread that you start, there’s no easy way to 
get a “return value” back from a thread that you Join. You need to set up some 
kind of shared field. And if the operation throws an exception, catching and 
propagating that exception is equally painful. 


¢ You cant tell a thread to start something else when it’s finished; instead you 
must Join it (blocking your own thread in the process). 


These limitations discourage fine-grained concurrency; in other words, they make it 
difficult to compose larger concurrent operations by combining smaller ones 
(something essential for the asynchronous programming that we look at in follow- 
ing sections). This in turn leads to greater reliance on manual synchronization 
(locking, signaling, and so on) and the problems that go with it. 


The direct use of threads also has performance implications that we discussed in 
“The Thread Pool” on page 590. And should you need to run hundreds or thou- 
sands of concurrent I/O-bound operations, a thread-based approach consumes 
hundreds or thousands of megabytes of memory purely in thread overhead. 


The Task class helps with all of these problems. Compared to a thread, a Task is 
higher-level abstraction—it represents a concurrent operation that might or might 
not be backed by a thread. Tasks are compositional (you can chain them together 
through the use of continuations). They can use the thread pool to lessen startup 
latency, and with a TaskCompletionSource, they can employ a callback approach 
that avoids threads altogether while waiting on I/O-bound operations. 


The Task types were introduced in Framework 4.0 as part of the parallel program- 
ming library. However, they have since been enhanced (through the use of awaiters) 
to play equally well in more general concurrency scenarios and are backing types 
for C#’s asynchronous functions. 


In this section, we ignore the features of tasks that are aimed 
specifically at parallel programming; we cover them instead in 
Chapter 23. 
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Starting a Task 


The easiest way to start a Task backed by a thread is with the static method 
Task.Run (the Task class is in the System. Threading. Tasks namespace). Simply 
pass in an Action delegate: 


Task.Run (() => Console.WriteLine ("Foo")); 


Tasks use pooled threads by default, which are background 
threads. This means that when the main thread ends, so do 
any tasks that you create. Hence, to run these examples from a 
Console application, you must block the main thread after 
starting the task (for instance, by Waiting the task or by call- 
ing Console.ReadLine): 


static void Main() 


Task.Run (() => Console.WriteLine ("Foo")); 
Console.ReadLine() ; 


} 


In the book’s LINQPad companion samples, Console.Read 
Line is omitted because the LINQPad process keeps back- 
ground threads alive. 


Calling Task.Run in this manner is similar to starting a thread as follows (except for 
the thread pooling implications that we discuss shortly): 


new Thread (() => Console.WriteLine ("Foo")).Start(); 


Task.Run returns a Task object that we can use to monitor its progress, rather like a 
Thread object. (Notice, however, that we didn't call Start after calling Task.Run 
because this method creates “hot” tasks; you can instead use Task’s constructor to 
create “cold” tasks although this is rarely done in practice.) 


You can track a task’s execution status via its Status property. 


Wait 


Calling Wait on a task blocks until it completes and is the equivalent of calling Join 
on a thread: 


Task task = Task.Run (() => 


Thread.Sleep (2000); 
Console.WriteLine ("Foo"); 


})3 
Console.WriteLine (task.IsCompleted); // False 
task.Wait(); // Blocks until task is complete 


Wait lets you optionally specify a timeout and a cancellation token to end the wait 
early (see “Cancellation” on page 625). 
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Long-running tasks 


By default, the CLR runs tasks on pooled threads, which is ideal for short-running 
compute-bound work. For longer-running and blocking operations (such as our 
preceding example), you can prevent use of a pooled thread as follows: 


Task task = Task.Factory.StartNew (() => ..., 
TaskCreationOptions.LongRunning) ; 


Running one long-running task on a pooled thread won't 
cause trouble; it's when you run multiple long-running tasks 
in parallel (particularly ones that block) that performance can 
suffer. And in that case, there are usually better solutions than 
TaskCreationOptions.LongRunning: 


e If the tasks are I/O bound, TaskCompletionSource and 
asynchronous functions let you implement concurrency 
with callbacks (continuations) instead of threads. 


e If the tasks are compute bound, a producer/consumer 
queue lets you throttle the concurrency for those tasks, 
avoiding starvation for other threads and processes (see 
“Writing a Producer/Consumer Queue” on page 962 in 
Chapter 23). 


Returning values 


Task has a generic subclass called Task<TResult>, which allows a task to emit a 
return value. You can obtain a Task<TResult> by calling Task.Run with a Func 
<TResult> delegate (or a compatible lambda expression) instead of an Action: 


Task<int> task = Task.Run (() => { Console.WriteLine ("Foo"); return 3; }); 


Ve 


You can obtain the result later by querying the Result property. If the task hasn't yet 
finished, accessing this property will block the current thread until the task finishes: 


int result = task.Result; // Blocks if not already finished 
Console.WriteLine (result); // 3 


In the following example, we create a task that uses LINQ to count the number of 
prime numbers in the first three million (+2) integers: 


Task<int> primeNumberTask = Task.Run (() => 
Enumerable.Range (2, 3000000).Count (n => 
Enumerable.Range (2, (int)Math.Sqrt(n)-1).ALl (i => n % i > 0))); 


Console.WriteLine ("Task running..."); 
Console.WriteLine ("The answer is " + primeNumberTask.Result); 


» 


This writes “Task running..”, and then a few seconds later, writes the answer of 
216816. 
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Task<TResult> can be thought of as a “future,” in that it 
encapsulates a Result that becomes available later in time. 


Exceptions 


Unlike with threads, tasks conveniently propagate exceptions. So, if the code in your 
task throws an unhandled exception (in other words, if your task faults), that excep- 
tion is automatically rethrown to whoever calls Wait()—or accesses the Result 
property of a Task<TResult>: 


// Start a Task that throws a NullReferenceException: 
Task task = Task.Run (() => { throw null; }); 
try 


task.Wait(); 
} 


catch (AggregateException aex) 


{ 
if (aex.InnerException is NullReferenceException) 
Console.WriteLine ("Null!"); 
else 
throw; 


} 


(The CLR wraps the exception in an AggregateException in order to play well with 
parallel programming scenarios; we discuss this in Chapter 23.) 


You can test for a faulted task without rethrowing the exception via the IsFaulted 
and IsCanceled properties of the Task. If both properties return false, no error 
occurred; if IsCanceled is true, an OperationCanceledException was thrown for 
that task (see “Cancellation” on page 625); if IsFaulted is true, another type of 
exception was thrown and the Exception property will indicate the error. 


Exceptions and autonomous tasks 


With autonomous “set-and-forget” tasks (those for which you don’t rendezvous via 
Wait() or Result, or a continuation that does the same), it’s good practice to explic- 


itly exception-handle the task code to avoid silent failure, just as you would with a 
thread. 
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Ignoring exceptions is fine when an exception solely indicates 
a failure to obtain a result that you’re no longer interested in. 
For example, if a user cancels a request to download a web 
page, we wouldn't care if it turns out that the web page didn't 
exist. 


Ignoring exceptions is problematic when an exception indi- 
cates a bug in your program, for two reasons: 


¢ The bug may have left your program in an invalid state. 


¢ More exceptions may occur later as a result of the bug, 
and failure to log the initial error can make diagnosis 


difficult. 


You can subscribe to unobserved exceptions at a global level via the static event Task 
Scheduler .UnobservedTaskException; handling this event and logging the error 
can make good sense. 


There are a couple of interesting nuances on what counts as unobserved: 


¢ Tasks waited upon with a timeout will generate an unobserved exception if the 
faults occurs after the timeout interval. 


¢ The act of checking a task’s Exception property after it has faulted makes the 
exception “observed.” 


Continuations 


A continuation says to a task, “when you've finished, continue by doing something 
else.” A continuation is usually implemented by a callback that executes once upon 
completion of an operation. There are two ways to attach a continuation to a task. 
The first is particularly significant because it’s used by C#’s asynchronous functions, 
as you'll see soon. We can demonstrate it with the prime number counting task that 
we wrote a short while ago in “Returning values” on page 594: 


Task<int> primeNumberTask = Task.Run (() => 
Enumerable.Range (2, 3000000).Count (n => 
Enumerable.Range (2, (int)Math.Sqrt(n)-1).ALl (i => n % i > 0))); 


var awaiter = primeNumberTask.GetAwaiter(); 
awaiter.OnCompleted (() => 
{ 


int result = awaiter.GetResult(); 
Console.WriteLine (result); // Writes result 


}); 


Calling GetAwaiter on the task returns an awaiter object whose OnCompleted 
method tells the antecedent task (primeNumberTask) to execute a delegate when it 
finishes (or faults). It’s valid to attach a continuation to an already-completed task, 
in which case the continuation will be scheduled to execute right away. 
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An awaiter is any object that exposes the two methods that 
we've just seen (OnCompleted and GetResult), and a Boolean 
property called IsCompleted. There’s no interface or base class 
to unify all of these members (although OnCompLeted is part of 
the interface INotifyCompletion). We explain the significance 
of the pattern in “Asynchronous Functions in C#” on page 
605. 


If an antecedent task faults, the exception is rethrown when the continuation code 
calls awaiter.GetResult(). Rather than calling GetResult, we could simply access 
the Result property of the antecedent. The benefit of calling GetResult is that if the 
antecedent faults, the exception is thrown directly without being wrapped in 
AggregateException, allowing for simpler and cleaner catch blocks. 


For nongeneric tasks, GetResult() has a void return value. Its useful function is 
then solely to rethrow exceptions. 


If a synchronization context is present, OnCompleted automatically captures it and 
posts the continuation to that context. This is very useful in rich client applications 
because it bounces the continuation back to the UI thread. In writing libraries, how- 
ever, it’s not usually desirable because the relatively expensive Ul-thread-bounce 
should occur just once upon leaving the library rather than between method calls. 
Hence, you can defeat it by using the ConfigureAwait method: 


var awaiter = primeNumberTask.ConfigureAwait (false) .GetAwaiter(); 


If no synchronization context is present—or you use ConfigureAwait(false)—the 
continuation will (in general) execute on the same thread as the antecedent, avoid- 
ing unnecessary overhead. 


The other way to attach a continuation is by calling the task’s ContinueWith 
method: 


primeNumberTask.ContinueWith (antecedent => 


{ 


int result = antecedent.Result; 
Console.WriteLine (result); // Writes 123 
}) 


ContinueWith itself returns a Task, which is useful if you want to attach further 
continuations. However, you must deal directly with AggregateException if the 
task faults, and write extra code to marshal the continuation in UI applications (see 
“Task Schedulers” on page 955 in Chapter 23). And in non-UI contexts, you must 
specify TaskContinuationOptions.ExecuteSynchronousLy if you want the contin- 
uation to execute on the same thread; otherwise it will bounce to the thread pool. 
ContinueWith is particularly useful in parallel programming scenarios; we cover it 
in detail in “Continuations” on page 950 in Chapter 23. 
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TaskCompletionSource 


We've seen how Task.Run creates a task that runs a delegate on a pooled (or non- 
pooled) thread. Another way to create a task is with TaskCompletionSource. 


TaskCompletionSource lets you create a task out of any operation that starts and 
finishes some time later. It works by giving you a “slave” task that you manually 
drive—by indicating when the operation finishes or faults. This is ideal for I/O- 
bound work: you get all the benefits of tasks (with their ability to propagate return 
values, exceptions, and continuations) without blocking a thread for the duration of 
the operation. 


To use TaskCompletionSource, you simply instantiate the class. It exposes a Task 
property that returns a task upon which you can wait and attach continuations— 
just as with any other task. The task, however, is controlled entirely by the 
TaskCompletionSource object via the following methods: 


public class TaskCompletionSource<TResult> 


{ 
public void SetResult (TResult result); 
public void SetException (Exception exception); 
public void SetCanceled(); 


public bool TrySetResult (TResult result); 

public bool TrySetException (Exception exception); 

public bool TrySetCanceled(); 

public bool TrySetCanceled (CancellationToken cancellationToken) ; 


Calling any of these methods signals the task, putting it into a completed, faulted, or 
canceled state (we cover the latter in the section “Cancellation” on page 625). You're 
supposed to call one of these methods exactly once: if called again, SetResult, 
SetException, or SetCanceled will throw an exception, whereas the Try* methods 
return false. 


The following example prints 42 after waiting for five seconds: 


var tcs = new TaskCompletionSource<int>(); 


new Thread (() => { Thread.Sleep (5000); tcs.SetResult (42); }) 
{ IsBackground = true } 
.Start(); 


Task<int> task = tcs.Task; // Our "slave" task. 
Console.WriteLine (task.ResuLt); // 42 


With TaskCompletionSource, we can write our own Run method: 


Task<TResult> Run<TResult> (Func<TResult> function) 
{ 


var tcs = new TaskCompletionSource<TResult>(); 
new Thread (() => 
{ 
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try { tcs.SetResult (function()); } 

catch (Exception ex) { tcs.SetException (ex); } 
}).Start(); 
return tcs.Task; 


} 
Task<int> task = Run (() => { Thread.Sleep (5000); return 42; }); 


Calling this method is equivalent to calling Task.Factory.StartNew with the 
TaskCreationOptions.LongRunning option to request a nonpooled thread. 


The real power of TaskCompletionSource is in creating tasks that don’t tie up 
threads. For instance, consider a task that waits for five seconds and then returns 
the number 42. We can write this without a thread by using the Timer class, which 
with the help of the CLR (and in turn, the OS) fires an event in x milliseconds (we 
revisit timers in Chapter 22): 


Task<int> GetAnswerToLife() 


{ 


var tcs = new TaskCompletionSource<int>(); 

// Create a timer that fires once in 5000 ms: 

var timer = new System.Timers.Timer (5000) { AutoReset = false }; 
timer.Elapsed += delegate { timer.Dispose(); tcs.SetResult (42); }; 
timer.Start(); 

return tcs.Task; 


} 


Hence, our method returns a task that completes five seconds later, with a result of 
42. By attaching a continuation to the task, we can write its result without blocking 
any thread: 


var awaiter = GetAnswerToLife().GetAwaiter(); 
awaiter.OnCompleted (() => Console.WriteLine (awaiter.GetResult())); 


We could make this more useful and turn it into a general-purpose Delay method 
by parameterizing the delay time and getting rid of the return value. This means 
having it return a Task instead of a Task<int>. However, there's no nongeneric ver- 
sion of TaskCompletionSource, which means we can't directly create a nongeneric 
Task. The workaround is simple: because Task<TResult> derives from Task, we cre- 
ate a TaskCompletionSource<anything> and then implicitly convert the Task 
<anything> that it gives us into a Task, like this: 


var tcs = new TaskCompletionSource<object>(); 
Task task = tcs.Task; 


Now we can write our general-purpose Delay method: 


Task Delay (int milliseconds) 

{ 
var tcs = new TaskCompletionSource<object>(); 
var timer = new System.Timers.Timer (milliseconds) { AutoReset = false }; 
timer.Elapsed += delegate { timer.Dispose(); tcs.SetResult (null); }; 
timer .Start(); 
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return tcs.Task; 


} 


Here’s how we can use it to write “42” after five seconds: 
Delay (5000).GetAwaiter().OnCompleted (() => Console.WriteLine (42)); 


Our use of TaskCompletionSource without a thread means that a thread is engaged 
only when the continuation starts, five seconds later. We can demonstrate this by 
starting 10,000 of these operations at once without error or excessive resource 
consumption: 


for (int i = 0; i < 10000; i++) 
Delay (5000).GetAwaiter().OnCompleted (() => Console.WriteLine (42)); 


Timers fire their callbacks on pooled threads, so after five sec- 
onds, the thread pool will receive 10,000 requests to call 
SetResult(null) on a TaskCompletionSource. If the requests 
arrive faster than they can be processed, the thread pool will 
respond by enqueuing and then processing them at the opti- 
mum level of parallelism for the CPU. This is ideal if the 
thread-bound jobs are short running, which is true in this 
case: the thread-bound job is merely the call to SetResult 
plus either the action of posting the continuation to the syn- 
chronization context (in a UI application) or otherwise the 
continuation itself (Console.WriteLine(42)). 


Task.Delay 


The Delay method that we just wrote is sufficiently useful that it’s available as a 
static method on the Task class 


Task.Delay (5000).GetAwaiter().OnCompleted (() => Console.WriteLine (42)); 
or: 
Task.Delay (5000).ContinueWith (ant => Console.WriteLine (42)); 


Task.DeLay is the asynchronous equivalent of Thread. Sleep. 


Principles of Asynchrony 


In demonstrating TaskCompletionSource, we ended up writing asynchronous meth- 
ods. In this section, we define exactly what asynchronous operations are and explain 
how this leads to asynchronous programming. 

Synchronous versus Asynchronous Operations 


A synchronous operation does its work before returning to the caller. 


An asynchronous operation can do (most or all of) its work after returning to the 
caller. 
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The majority of methods that you write and call are synchronous. An example is 
List<T>.Add, or Console.WriteLine, or Thread.Sleep. Asynchronous methods are 
less common and initiate concurrency, because work continues in parallel to the 
caller. Asynchronous methods typically return quickly (or immediately) to the 
caller; thus, they are also called nonblocking methods. 


Most of the asynchronous methods that we've seen so far can be described as 
general-purpose methods: 


e Thread.Start 
e Task.Run 


e Methods that attach continuations to tasks 


In addition, some of the methods that we discussed in “Synchronization Contexts” 
on page 589 (Dispatcher .BeginInvoke, Control.BeginInvoke, and Synchroniza 
tionContext.Post) are asynchronous, as are the methods that we wrote in “Task- 
CompletionSource” on page 598, including Delay. 


What Is Asynchronous Programming? 


The principle of asynchronous programming is that you write long-running (or 
potentially long-running) functions asynchronously. This is in contrast to the con- 
ventional approach of writing long-running functions synchronously, and then call- 
ing those functions from a new thread or task to introduce concurrency as required. 


The difference with the asynchronous approach is that concurrency is initiated 
inside the long-running function rather than from outside the function. This has 
two benefits: 


¢ I/O-bound concurrency can be implemented without tying up threads (as we 
demonstrate in “TaskCompletionSource” on page 598), improving scalability 
and efficiency. 


¢ Rich-client applications end up with less code on worker threads, simplifying 
thread safety. 


This, in turn, leads to two distinct uses for asynchronous programming. The first is 
writing (typically server-side) applications that deal efficiently with a lot of concur- 
rent I/O. The challenge here is not thread safety (because there's usually minimal 
shared state) but thread efficiency; in particular, not consuming a thread per net- 
work request. So, in this context, it’s only I/O-bound operations that benefit from 
asynchrony. 


The second use is to simplify thread safety in rich-client applications. This is partic- 
ularly relevant as a program grows in size, because to deal with complexity, we typi- 
cally refactor larger methods into smaller ones, resulting in chains of methods that 
call one another (call graphs). 
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With a traditional synchronous call graph, if any operation within the graph is long- 
running, we must run the entire call graph on a worker thread to maintain a respon- 
sive UI. Hence, we end up with a single concurrent operation that spans many 
methods (coarse-grained concurrency), and this requires considering thread safety 
for every method in the graph. 


With an asynchronous call graph, we need not start a thread until it’s actually 
needed, typically low in the graph (or not at all in the case of I/O-bound opera- 
tions). All other methods can run entirely on the UI thread, with much-simplified 
thread safety. This results in fine-grained concurrency—a sequence of small concur- 
rent operations, between which execution bounces to the UI thread. 


To benefit from this, both I/O- and compute-bound opera- 
tions need to be written asynchronously; a good rule of thumb 
is to include anything that might take longer than 50 ms. 


(On the flip side, excessively fine-grained asynchrony can hurt 
performance, because asynchronous operations incur an over- 
head—see “Optimizations” on page 621.) 


In this chapter, we focus mostly on the rich-client scenario, which is the more com- 
plex of the two. In Chapter 16, we give two examples that illustrate the I/O-bound 
scenario (see “Concurrency with TCP” on page 719 and “Writing an HTTP Server” 
on page 710). 


The UWP framework encourages asynchronous program- 
ming to the point where synchronous versions of some long- 
running methods are either not exposed or throw exceptions. 
Instead, you must call asynchronous methods that return 
tasks (or objects that can be converted into tasks via the 
AsTask extension method). 


Asynchronous Programming and Continuations 


Tasks are ideally suited to asynchronous programming, because they support con- 
tinuations, which are essential for asynchrony (consider the Delay method that we 
wrote in “TaskCompletionSource” on page 598). In writing Delay, we used 
TaskCompletionSource, which is a standard way to implement “bottom-level” I/O- 
bound asynchronous methods. 


For compute-bound methods, we use Task.Run to initiate thread-bound concur- 
rency. Simply by returning the task to the caller, we create an asynchronous method. 
What distinguishes asynchronous programming is that we aim to do so lower in the 
call graph, so that in rich-client applications, higher-level methods can remain on 
the UI thread and access controls and shared state without thread-safety issues. To 
illustrate, consider the following method that computes and counts prime numbers, 
using all available cores (we discuss ParallelEnumerable in Chapter 23): 


int GetPrimesCount (int start, int count) 


{ 


return 
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ParallelEnumerable.Range (start, count).Count (n => 
Enumerable.Range (2, (int)Math.Sqrt(n)-1).ALl (i => n % i > 0)); 
} 


The details of how this works are unimportant; what matters is that it can take a 
while to run. We can demonstrate this by writing another method to call it: 


void DispLayPrimeCounts() 
{ 
for (int i = 0; i < 10; i++) 
Console.WriteLine (GetPrimesCount (i*1000000 + 2, 1000000) + 
"primes between " + (i*1000000) + " and " + ((i+1)*1000000-1)); 
Console.WriteLine ("Done!"); 


I 
Here's the output: 


78498 primes between 0 and 999999 

70435 primes between 1000000 and 1999999 
67883 primes between 2000000 and 2999999 
66330 primes between 3000000 and 3999999 
65367 primes between 4000000 and 4999999 
64336 primes between 5000000 and 5999999 
63799 primes between 6000000 and 6999999 
63129 primes between 7000000 and 7999999 
62712 primes between 8000000 and 8999999 
62090 primes between 9000000 and 9999999 


Now we have a call graph, with DisplayPrimeCounts calling GetPrimesCount. The 
former uses Console.WriteLine for simplicity, although in reality it would more 
likely be updating UI controls in a rich-client application, as we demonstrate later. 
We can initiate coarse-grained concurrency for this call graph as follows: 


Task.Run (() => DisplayPrimeCounts()); 


With a fine-grained asynchronous approach, we instead start by writing an asyn- 
chronous version of GetPrimesCount: 


Task<int> GetPrimesCountAsync (int start, int count) 


{ 
return Task.Run (() => 
ParallelEnumerable.Range (start, count).Count (n => 
Enumerable.Range (2, (int) Math.Sqrt(n)-1).All (i => n% i > 0))); 
} 


Why Language Support Is Important 


Now we must modify DisplayPrimeCounts so that it calls GetPrimesCountAsync. 
This is where C#’s await and async keywords come into play, because to do so 
otherwise is trickier than it sounds. If we simply modify the loop as follows: 


for (int i = 0; i < 10; i++) 
{ 
var awaiter = GetPrimesCountAsync (i*1000000 + 2, 1000000) .GetAwaiter(); 
awaiter.OnCompleted (() => 
Console.WriteLine (awaiter.GetResult() + 


primes between... ")); 
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} 


Console.WriteLine ("Done"); 


the loop will rapidly spin through 10 iterations (the methods being nonblocking) 
and all 10 operations will execute in parallel (followed by a premature “Done”). 


Executing these tasks in parallel is undesirable in this case 
because their internal implementations are already parallel- 
ized; it will only make us wait longer to see the first results 
(and muck up the ordering). 


There is a much more common reason, however, for needing 
to serialize the execution of tasks, which is that Task B 
depends on the result of Task A. For example, in fetching a 
web page, a DNS lookup must precede the HTTP request. 


To get them running sequentially, we must trigger the next loop iteration from the 
continuation itself. This means eliminating the for loop and resorting to a recursive 
call in the continuation: 


void DisplayPrimeCounts() 


{ 
DispLayPrimeCountsFrom (0); 


} 


void DisplayPrimeCountsFrom (int i) 


{ 
var awaiter = GetPrimesCountAsync (i*1000000 + 2, 1000000) .GetAwaiter(); 
awaiter.OnCompleted (() => 


{ 
Console.WriteLine (awaiter.GetResult() + 
if (++i < 10) DisplayPrimeCountsFrom (i); 
else Console.WriteLine ("Done"); 
}) 
} 


It gets even worse if we want to make DisplayPrimesCount itself asynchronous, 
returning a task that it signals upon completion. To accomplish this requires creat- 
ing a TaskCompletionSource: 


primes between..."); 


Task DisplayPrimeCountsAsync() 

{ 
var machine = new PrimesStateMachine(); 
machine.DisplayPrimeCountsFrom (0); 
return machine. Task; 


} 


class PrimesStateMachine 


{ 
TaskCompletionSource<object> _tcs = new TaskCompletionSource<object>(); 
public Task Task { get { return _tcs.Task; } } 


public void DisplayPrimeCountsFrom (int i) 


{ 
var awaiter = GetPrimesCountAsync (i1*1000000+2, 1000000) .GetAwaiter(); 
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awaiter.OnCompleted (() => 

{ 
Console.WriteLine (awaiter.GetResult()); 
if (++i < 10) DisplayPrimeCountsFrom (i); 
else { Console.WriteLine ("Done"); _tcs.SetResult (null); } 

}) 

} 
} 


Fortunately, C#’s asynchronous functions do all of this work for us. With the async 
and await keywords, we need only write this: 


async Task DisplayPrimeCountsAsync() 


{ 
for (int i = 0; i < 10; i++) 
Console.WriteLine (await GetPrimesCountAsync (i*1000000 + 2, 1000000) + 
"primes between " + (i1*1000000) + " and " + ((i+1)*1000000-1)); 
Console.WriteLine ("Done!"); 
} 


Consequently, async and await are essential for implementing asynchrony without 
excessive complexity. Let’s now see how these keywords work. 


Another way of looking at the problem is that imperative 
looping constructs (for, foreach, and so on) do not mix well 
with continuations, because they rely on the current local state 
of the method (“how many more times is this loop going to 
run?”). 


Although the async and await keywords offer one solution, 
it’s sometimes possible to solve it in another way by replacing 
the imperative looping constructs with the functional equiva- 
lent (in other words, LINQ queries). This is the basis of Reac- 
tive Framework (Rx) and can be a good option when you want 
to execute query operators over the result—or combine multi- 
ple sequences. The price to pay is that to avoid blocking, Rx 
operates over push-based sequences, which can be conceptu- 
ally tricky. 
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Asynchronous Functions in C# 


The async and await keywords let you write asynchronous code that has the same 
structure and simplicity as synchronous code while eliminating the “plumbing” of 
asynchronous programming. 


Awaiting 


The await keyword simplifies the attaching of continuations. Starting with a basic 
scenario, the compiler expands this: 


var result = await expression; 
statement(s); 
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into something functionally similar to this: 


var awaiter = expression.GetAwaiter(); 
awaiter.OnCompleted (() => 
{ 


var result = awaiter.GetResuLlt(); 
statement(s); 


}); 


The compiler also emits code to short-circuit the continuation 
in case of synchronous completion (see “Optimizations” on 
page 621) and to handle various nuances that we pick up in 
later sections. 


To demonstrate, let’s revisit the asynchronous method that we wrote previously that 
computes and counts prime numbers: 


Task<int> GetPrimesCountAsync (int start, int count) 


{ 
return Task.Run (() => 
ParallelEnumerable.Range (start, count).Count (n => 
Enumerable.Range (2, (int)Math.Sqrt(n)-1).AlLl (i => n % i > 0))); 
} 


With the await keyword, we can call it as follows: 


int result = await GetPrimesCountAsync (2, 1000000); 
Console.WriteLine (result); 


To compile, we need to add the async modifier to the containing method: 


async void DisplayPrimesCount() 


{ 
int result = await GetPrimesCountAsync (2, 1000000); 
Console.WriteLine (result); 


} 


The async modifier instructs the compiler to treat await as a keyword rather than 
an identifier should an ambiguity arise within that method (this ensures that code 
written prior to C# 5 that might use await as an identifier will still compile without 
error). The async modifier can be applied only to methods (and lambda expres- 
sions) that return void or (as you'll see later) a Task or Task<TResult>. 


The async modifier is similar to the unsafe modifier in that it 
has no effect on a method’s signature or public metadata; it 
affects only what happens inside the method. For this reason, 
it makes no sense to use async in an interface. However, it is 
legal, for instance, to introduce async when overriding a non- 
async virtual method, as long as you keep the signature the 
same. 


Methods with the async modifier are called asynchronous functions, because they 
themselves are typically asynchronous. To see why, let’s look at how execution pro- 
ceeds through an asynchronous function. 





606 | Chapter 14: Concurrency and Asynchrony 


Upon encountering an await expression, execution (normally) returns to the 
caller—rather like with yield return in an iterator. But before returning, the run- 
time attaches a continuation to the awaited task, ensuring that when the task com- 
pletes, execution jumps back into the method and continues where it left off. If the 
task faults, its exception is rethrown, otherwise its return value is assigned to the 
await expression. We can summarize everything we just said by looking at the logi- 
cal expansion of the asynchronous method we just examined: 


void DisplayPrimesCount() 


{ 
var awaiter = GetPrimesCountAsync (2, 1000000) .GetAwaiter(); 
awaiter.OnCompleted (() => 


: int result = awaiter.GetResult(); 
Console.WriteLine (result); 
3); 
I 

The expression upon which you await is typically a task; however, any object with a 
GetAwaiter method that returns an awaiter (implementing INotifyCompletion.On 
Completed and with an appropriately typed GetResult method and a bool 
IsCompleted property) will satisfy the compiler. 


Notice that our await expression evaluates to an int type; this is because the expres- 
sion that we awaited was a Task<int> (whose GetAwaiter().GetResult() method 
returns an int). 


Awaiting a nongeneric task is legal and generates a void expression: 


await Task.Delay (5000); 
Console.WriteLine ("Five seconds passed!"); 


Capturing local state 


The real power of await expressions is that they can appear almost anywhere in 
code. Specifically, an await expression can appear in place of any expression (within 
an asynchronous function) except for inside a Lock expression or unsafe context. 


In the following example, we await inside a loop: 


async void DisplayPrimeCounts() 


{ 
for (int i = 0; i < 10; i++) 
Console.WriteLine (await GetPrimesCountAsync (i*1000000+2, 1000000)); 
} 


Upon first executing GetPrimesCountAsync, execution returns to the caller by virtue 
of the await expression. When the method completes (or faults), execution resumes 
where it left off, with the values of local variables and loop counters preserved. 


Without the await keyword, the simplest equivalent might be the example we wrote 
in “Why Language Support Is Important” on page 603. The compiler, however, takes 
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the more general strategy of refactoring such methods into state machines (rather 
like it does with iterators). 


The compiler relies on continuations (via the awaiter pattern) to resume execution 
after an await expression. This means that if running on the UI thread of a rich- 
client application, the synchronization context ensures execution resumes on the 
same thread. Otherwise, execution resumes on whatever thread the task finished on. 
The change-of-thread does not affect the order of execution and is of little conse- 
quence unless you're somehow relying on thread affinity, perhaps through the use of 
thread-local storage (see “Thread-Local Storage” on page 914 in Chapter 22). It’s like 
touring a city and hailing taxis to get from one destination to another. With a syn- 
chronization context, you'll always get the same taxi; with no synchronization con- 
text, you'll usually get a different taxi each time. In either case, though, the journey 
is the same. 


Awaiting ina Ul 


We can demonstrate asynchronous functions in a more practical context by writing 
a simple UI that remains responsive while calling a compute-bound method. Let's 
begin with a synchronous solution: 


class TestUI : Window 

{ 
Button _button = new Button { Content = "Go" }; 
TextBlock _results = new TextBlock(); 


public TestUI() 
{ 
var panel = new StackPanel(); 
panel.Children.Add (_button); 
panel.Children.Add (_results); 
Content = panel; 
_button.Click += (sender, args) => Go(); 
} 


void Go() 
{ 
for (int it = 1; i < 5; i++) 
_results.Text += GetPrimesCount (i * 1000000, 1000000) + 
"primes between " + (i1*1000000) + " and " + ((i+1)*1000000-1) + 
Environment .NewLine; 


} 


int GetPrimesCount (int start, int count) 
{ 
return ParallelEnumerable.Range (start, count).Count (n => 
Enumerable.Range (2, (int) Math.Sqrt(n)-1).ALl (i => n% i > 0)); 
} 
} 


Upon pressing the “Go” button, the application becomes unresponsive for the time 
it takes to execute the compute-bound code. There are two steps in asynchronizing 
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this; the first is to switch to the asynchronous version of GetPrimesCount that we 
used in previous examples: 


Task<int> GetPrimesCountAsync (int start, int count) 


{ 
return Task.Run (() => 
ParallelEnumerable.Range (start, count).Count (n => 
Enumerable.Range (2, (int) Math.Sqrt(n)-1).AlLl (i => n% i > 0))); 
} 


The second step is to modify Go to call GetPrimesCountAsync: 


async void Go() 


{ 
_button.IsEnabled = false; 
for (int i = 1; i < 5; i++) 

_results.Text += await GetPrimesCountAsync (i * 1000000, 1000000) + 
"primes between " + (1*1000000) + " and " + ((i+1)*1000000-1) + 
Environment .NewLine; 

_button.IsEnabled = true; 


} 


This illustrates the simplicity of programming with asynchronous functions: you 
program as you would synchronously, but call asynchronous functions instead of 
blocking functions and await them. Only the code within GetPrimesCountAsync 
runs on a worker thread; the code in Go “leases” time on the UI thread. We could say 
that Go executes pseudoconcurrently to the message loop (in that its execution is 
interspersed with other events that the UI thread processes). With this pseudocon- 
currency, the only point at which preemption can occur is during an await. This 
simplifies thread safety: in our case, the only problem that this could cause is reen- 
trancy (clicking the button again while it’s running, which we avoid by disabling the 
button). True concurrency occurs lower in the call stack, inside code called by 
Task.Run. To benefit from this model, truly concurrent code avoids accessing 
shared state or UI controls. 


To give another example, suppose that instead of calculating prime numbers, we 
want to download several web pages and sum their lengths. INET Core exposes 
numerous task-returning asynchronous methods, one of which is the WebClient 
class in System.Net. The DownloadDataTaskAsync method asynchronously down- 
loads a URI to a byte array, returning a Task<byte[]>, so by awaiting it, we get a 
byte[ ]. Let’s now rewrite our Go method: 


async void Go() 


{ 
_button.IsEnabled = false; 
string[] urls = "www.albahari.com www.oreilly.com www. Lingpad.net".Split(); 
int totalLength = 0; 
try 


foreach (string url in urls) 


{ 
var uri = new Uri ("http://" + url); 
byte[] data = await new WebClient().DownloadDataTaskAsync (uri); 
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_results.Text += "Length of "+ url + " is 
Environment .NewLine; 


totalLength += data.Length; 


+ data.Length + 


} 
_results.Text += "Total length: " + totalLength; 


catch (WebException ex) 


{ 
_results.Text += "Error: " + ex.Message; 
} 
finally { _button.IsEnabled = true; } 
} 


Again, this mirrors how wed write it synchronously—including the use of catch 
and finally blocks. Even though execution returns to the caller after the first 
await, the finally block does not execute until the method has logically completed 
(by virtue of all its code executing—or an early return or unhandled exception). 


It can be helpful to consider exactly what’s happening underneath. First, we need to 
revisit the pseudocode that runs the message loop on the UI thread: 


Set synchronization context for this thread to WPF sync context 
while (!thisApplication. Ended) 
{ 


wait for something to appear in message queue 

Got something: what kind of message is it? 
Keyboard/mouse message -> fire an event handler 
User BeginInvoke/Invoke message -> execute delegate 


} 


Event handlers that we attach to UI elements execute via this message loop. When 
our Go method runs, execution proceeds as far as the await expression, and then 
returns to the message loop (freeing the UI to respond to further events). However, 
the compiler’s expansion of await ensures that before returning, a continuation is 
set up such that execution resumes where it left off upon completion of the task. 
And because we awaited on a UI thread, the continuation posts to the synchroniza- 
tion context which executes it via the message loop, keeping our entire Go method 
executing pseudo-concurrently on the UI thread. True (I/O-bound) concurrency 
occurs within the implementation of DownloadDataTaskAsync. 


Comparison to coarse-grained concurrency 


Asynchronous programming was difficult prior to C# 5, not only because there was 
no language support, but also because the .NET Framework exposed asynchronous 
functionality through clumsy patterns called the EAP and the APM (see “Obsolete 
Patterns” on page 633) rather than task-returning methods. 


The popular workaround was coarse-grained concurrency (in fact, there was even a 
type called BackgroundWorker to help with that). Returning to our original synchro- 
nous example with GetPrimesCount, we can demonstrate coarse-grained asyn- 
chrony by modifying the button’s event handler, as follows: 
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_button.Click += (sender, args) => 
{ 
_button.IsEnabled = false; 
Task.Run (() => Go()); 
}3 
(We've chosen to use Task.Run rather than BackgroundWorker because the latter 
would do nothing to simplify our particular example.) In either case, the end result 
is that our entire synchronous call graph (Go plus GetPrimesCount) runs on a 
worker thread. And because Go updates UI elements, we must now litter our code 
with Dispatcher .BeginInvoke: 


void Go() 

{ 
for (int i = 1; i < 5; i++) 
{ 


int result = GetPrimesCount (i * 1000000, 1000000); 

Dispatcher .BeginInvoke (new Action (() => 
_results.Text += result + " primes between " + (i*1000000) + 
"and " + ((i+1)*1000000-1) + Environment.NewLine)); 


} 
Dispatcher .BeginInvoke (new Action (() => _button.IsEnabled = true)); 


} 


Unlike with the asynchronous version, the loop itself runs on a worker thread. This 
might seem innocuous, and yet, even in this simple case, our use of multithreading 
has introduced a race condition. (Can you spot it? If not, try running the program: 
it will almost certainly become apparent.) 


Implementing cancellation and progress reporting creates more possibilities for 
thread-safety errors, as does any additional code in the method. For instance, sup- 
pose that the upper limit for the loop is not hardcoded, but comes from a method 
call: 


for (int i = 1; i < GetUpperBound(); i++) 


Now suppose that GetUpperBound() reads the value from a lazily-loaded configura- 
tion file, which loads from disk upon first call. All of this code now runs on your 
worker thread, code that’s most likely not thread-safe. This is the danger of starting 
worker threads high in the call graph. 


Writing Asynchronous Functions 


With any asynchronous function, you can replace the void return type with a Task 
to make the method itself usefully asynchronous (and awaitable). No further 
changes are required: 


async Task PrintAnswerToLife() // We can return Task instead of void 


{ 
await Task.Delay (5000); 
int answer = 21 * 2; 
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Console.WriteLine (answer); 


} 


Notice that we don’t explicitly return a task in the method body. The compiler man- 
ufactures the task, which it signals upon completion of the method (or upon an 
unhandled exception). This makes it easy to create asynchronous call chains: 


async Task Go() 
{ 


await PrintAnswerToLife(); 
Console.WriteLine ("Done"); 


} 


And because we've declared Go with a Task return type, Go itself is awaitable. 


The compiler expands asynchronous functions that return tasks into code that uses 
TaskCompletionSource to create a task that it then signals or faults. 


Nuances aside, we can expand PrintAnswerToLife into the following functional 
equivalent: 


Task PrintAnswerToLife( ) 
{ 


var tcs = new TaskCompletionSource<object>(); 
var awaiter = Task.Delay (5000).GetAwaiter(); 
awaiter.OnCompleted (() => 


{ 
try 


{ 


awaiter.GetResult(); // Re-throw any exceptions 
int answer = 21 * 2; 

Console.WriteLine (answer); 

tcs.SetResult (null); 


} 


catch (Exception ex) { tcs.SetException (ex); } 


y); 


return tcs.Task; 


} 


Hence, whenever a task-returning asynchronous method finishes, execution jumps 
back to whatever awaited it (by virtue of a continuation). 


In a rich-client scenario, execution bounces at this point back 
to the UI thread (if it’s not already on the UI thread). Other- 
wise, it continues on whatever thread the continuation came 
back on. This means that there’s no latency cost in bubbling 
up asynchronous call graphs, other than the first “bounce” if it 
was UI-thread-initiated. 


Returning Task<TResult> 


You can return a Task<TResult> if the method body returns TResult: 
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async Task<int> GetAnswerToLife() 


{ 
await Task.Delay (5000); 
int answer = 21 * 2; 
return answer; // Method has return type Task<int> we return int 


} 


Internally, this results in the TaskCompletionSource being signaled with a value 
rather than null. We can demonstrate GetAnswerToLife by calling it from Print 
Answer ToLife (which in turn, called from Go): 


async Task Go() 
{ 


await PrintAnswerToLife(); 
Console.WriteLine ("Done"); 


} 


async Task PrintAnswerToLife() 


{ 
int answer = await GetAnswerToLife(); 
Console.WriteLine (answer); 


} 


async Task<int> GetAnswerToLife() 


{ 
await Task.Delay (5000); 
int answer = 21 * 2; 
return answer; 


} 


In effect, we've refactored our original PrintAnswerToLife into two methods—with 
the same ease as if we were programming synchronously. The similarity to synchro- 
nous programming is intentional; here's the synchronous equivalent of our call 
graph, for which calling Go() gives the same result after blocking for five seconds: 
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void Go() 
{ 


PrintAnswerToLife(); 
Console.WriteLine ("Done"); 


} 


void PrintAnswerToLife() 

{ 
int answer = GetAnswerToLife(); 
Console.WriteLine (answer); 


} 


int GetAnswerToLife() 
{ 
Thread.Sleep (5000); 
int answer = 21 * 2; 
return answer; 


} 
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This also illustrates the basic principle of how to design with 
asynchronous functions in C#: 


1. Write your methods synchronously. 


2. Replace synchronous method calls with asynchronous 
method calls, and await them. 


3. Except for “top-level” methods (typically event handlers 
for UI controls), upgrade your asynchronous methods’ 
return types to Task or Task<TResult> so that they’re 
awaitable. 


The compiler’s ability to manufacture tasks for asynchronous functions means that 
for the most part, you need to explicitly instantiate a TaskCompletionSource only in 
(the relatively rare case of) bottom-level methods that initiate I/O-bound concur- 
rency. (And for methods that initiate compute-bound concurrency, you create the 
task with Task.Run.) 


Asynchronous call graph execution 
To see exactly how this executes, it’s helpful to rearrange our code as follows: 


async Task Go() 
{ 


var task = PrintAnswerToLife(); 
await task; Console.WriteLine ("Done"); 


} 


async Task PrintAnswerToLife() 


{ 


var task = GetAnswerToLife(); 
int answer = await task; Console.WriteLine (answer); 


} 


async Task<int> GetAnswerToLife() 


{ 
var task = Task.Delay (5000); 


await task; int answer = 21 * 2; return answer; 


} 


Go calls PrintAnswerToLife, which calls GetAnswerToLife, which calls Delay and 
then awaits. The await causes execution to return to PrintAnswerToLife, which 
itself awaits, returning to Go, which also awaits and returns to the caller. All of this 
happens synchronously on the thread that called Go; this is the brief synchronous 
phase of execution. 


Five seconds later, the continuation on Delay fires and execution returns to 
GetAnswerToLife on a pooled thread. (If we started on a UI thread, execution now 
bounces to that thread.) The remaining statements in GetAnswerToLife then run, 
after which the method’s Task<int> completes with a result of 42 and executes the 
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continuation in PrintAnswerToLife, which executes the remaining statements in 
that method. The process continues until Go’s task is signaled as complete. 


Execution flow matches the synchronous call graph that we showed earlier because 
were following a pattern whereby we await every asynchronous method immedi- 
ately after calling it. This creates a sequential flow with no parallelism or overlap- 
ping execution within the call graph. Each await expression creates a gap in 
execution, after which the program resumes where it left off. 


Parallelism 


Calling an asynchronous method without awaiting it allows the code that follows to 
execute in parallel. You might have noticed in earlier examples that we had a button 
whose event handler called Go, as follows: 


_button.Click += (sender, args) => Go(); 


Despite Go being an asynchronous method, we didn't await it, and this is indeed 
what facilitates the concurrency needed to maintain a responsive UI. 


We can use this same principle to run two asynchronous operations in parallel: 


var task1 = PrintAnswerToLife(); 
var task2 = PrintAnswerToLife(); 
await taski; await task2; 


(By awaiting both operations afterward, we “end” the parallelism at that point. Later, 
we describe how the WhenA1l task combinator helps with this pattern.) 


Concurrency created in this manner occurs whether or not the operations are initi- 
ated on a UI thread, although there's a difference in how it occurs. In both cases, we 
get the same “true” concurrency occurring in the bottom-level operations that ini- 
tiate it (such as Task.Delay, or code farmed to Task.Run). Methods above this in 
the call stack will be subject to true concurrency only if the operation was initiated 
without a synchronization context present; otherwise they will be subject to the 
pseudoconcurrency (and simplified thread safety) that we talked about earlier, 
whereby the only places at which we can be preempted is at an await statement. 
This lets us, for instance, define a shared field, _x, and increment it in GetAnswerTo 
Life without locking: 


async Task<int> GetAnswerToLife() 


{ 
_X#43 
await Task.Delay (5000); 
return 21 * 2; 


} 


(We would, though, be unable to assume that _x had the same value before and after 
the await.) 
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Asynchronous Lambda Expressions 
Just as ordinary named methods can be asynchronous: 


async Task NamedMethod() 


{ 
await Task.Delay (1000); 
Console.WriteLine ("Foo"); 


} 


so can unnamed methods (lambda expressions and anonymous methods), if pre- 
ceded by the async keyword: 


Func<Task> unnamed = async () => 


{ 
await Task.Delay (1000); 
Console.WriteLine ("Foo"); 


t3 
We can call and await these in the same way: 


await NamedMethod(); 
await unnamed(); 


We can use asynchronous lambda expressions when attaching event handlers: 


myButton.Click += async (sender, args) => 


{ 
await Task.Delay (1000); 
myButton.Content = "Done"; 


t3 
This is more succinct than the following, which has the same effect: 


myButton.Click += ButtonHandler; 


async void ButtonHander (object sender, EventArgs args) 


{ 
await Task.Delay (1000); 
myButton.Content = "Done"; 


33 
Asynchronous lambda expressions can also return Task<TResult>: 


Func<Task<int>> unnamed = async () => 


{ 
await Task.Delay (1000); 
return 123; 

35 


int answer = await unnamed(); 


Asynchronous Streams (C# 8) 


Prior to C# 8, you could use yield return to write an iterator, or await to write an 
asynchronous function. But you couldn't do both and write an iterator that awaits, 
yielding elements asynchronously. C# 8 fixes this through the introduction of asyn- 
chronous streams. 
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Asynchronous streams build on the following pair of interfaces, which are asyn- 
chronous counterparts to the enumeration interfaces we described in “Enumeration 
and Iterators” on page 179 in Chapter 4: 


public interface IAsyncEnumerable<out T> 


t 


TAsyncEnumerator<T> GetAsyncEnumerator (...)3 


} 


public interface IAsyncEnumerator<out T>: IAsyncDisposable 


{ 
T Current { get; } 
ValueTask<bool> MoveNextAsync(); 


} 


ValueTask<T> is a struct that wraps Task<T> and is behaviorally similar to Task<T> 
while enabling more efficient execution when the task completes synchronously 
(which can happen often when enumerating a sequence). See “ValueTask<T>” on 
page 623 for a discussion of differences. IAsyncDisposable is an asynchronous ver- 
sion of IDisposable; it provides an opportunity to perform cleanup should you 
choose to manually implement the interfaces: 


public interface IAsyncDisposable 


{ 
ValueTask DisposeAsync(); 


} 


The act of fetching each element from the sequence (MoveNext 
Async) is an asynchronous operation, so asynchronous 
streams are suitable when elements arrive in a piecemeal fash- 
ion (such as when processing data from a video stream). In 
contrast, the following type is more suitable when the 
sequence as a whole is delayed, but the elements, when they 
arrive, arrive all together: 


Task<IEnumerable<T>> 


To generate an asynchronous stream, you write a method that combines the princi- 
ples of iterators and asynchronous methods. In other words, your method should 
include both yield return and await, and it should return IAsyncEnumerable<T>: 


async IAsyncEnumerable<int> RangeAsync ( 
int start, int count, int delay) 
{ 
for (int i = start; i < start + count; i++) 
{ 
await Task.Delay (delay); 
yield return i; 
} 
} 


To consume an asynchronous stream, use the await foreach statement: 


await foreach (var number in RangeAsync (0, 10, 500)) 
Console.WriteLine (number); 
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Note that data arrives steadily, every 500 milliseconds (or, in real life, as it becomes 
available). Contrast this to a similar construct using Task<IEnumerable<T>> for 
which no data is returned until the last piece of data is available: 


static async Task<IEnumerable<int>> RangeTaskAsync(int start, int count, 
int delay) 
{ 
List<int> data = new List<int>(); 
for (int i = start; i < start + count; i++) 


{ 
await Task.Delay (delay); 


data.Add (i); 
} 


return data; 


} 


Here’s how to consume it with the foreach statement: 


foreach (var data in await RangeTaskAsync(0, 10, 500)) 
Console.WriteLine (data); 


Querying |AsyncEnumerable<T> 


The System.Ling.Async NuGet package defines LINQ query operators that operate 
over IAsyncEnumerable<T>, allowing you to write queries much as you would with 
IEnumerable<T>. 


For instance, we can write a LINQ query over the RangeAsync method that we 
defined in the preceding section, as follows: 


TAsyncEnumerable<int> query = 
from i in RangeAsync (0, 10, 500) 
where i % 2 == // Even numbers only. 
select i * 10; // Multiply by 10. 


await foreach (var number in query) 
Console.WriteLine (number); 


This outputs 0, 20, 40, and so on. 


If youre familiar with Reactive Extensions, you can benefit 
from its (more powerful) query operators, too, by calling the 
ToObservable extension method, which converts an IAsync 
Enumerable<T> into an IObservable<T>. A  ToAsync 
Enumerable extension method is also available, to convert in 
the reverse direction. 


lAsyncEnumerable<T> in ASP.Net Core 


ASP.Net Core controller actions can now return IAsyncEnumerable<T>. Such meth- 
ods must be marked async. For example: 
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[HttpGet ] 
public async IAsyncEnumerable<string> Get() 


{ 
using var dbContext = new BookContext(); 
await foreach (var title in dbContext.Books 
.Select(b => b.Title) 
.AsAsyncEnumerable()) 
yield return title; 
} 


Asynchronous Methods in WinRT 


In WinRT libraries, the equivalent of Task is IAsyncAction and the equivalent of 
Task<TResult> is IAsyncOperation<TResult>. And for operations that report pro- 
gress, the equivalents are IAsyncOperationWithProgress<TResult> and IAsyncO 
perationWithProgress<TResult>. They are all defined in the Windows. Foundation 
namespace. 


You can convert from either into a Task or Task<TResuLt> via the AsTask extension 
method: 


Task<StorageFile> fileTask = KnownFolders.DocumentsLibrary.CreateFileAsync 
C"test.txt").AsTask(); 


Or, you can await them directly: 


StorageFile file = await KnownFolders.DocumentsLibrary.CreateFileAsync 
("test.txt"); 


Due to limitations in the COM type system, IAsync 
Operation<TResult> and IAsyncOperationWithProgress 
<TResult> are not based on IAsyncAction as you might 
expect. Instead, both inherit from a common base type called 
TAsyncInfo. 


The AsTask method is also overloaded to accept a cancellation token (see “Cancella- 
tion” on page 625). It can also accept an IProgress<T> object when chained to the 
WithProgress variants (see “Progress Reporting” on page 627). 


Asynchrony and Synchronization Contexts 


We've already seen how the presence of a synchronization context is significant in 
terms of posting continuations. There are a couple of other more subtle ways in 
which such synchronization contexts come into play with void-returning asynchro- 
nous functions. These are not a direct result of C# compiler expansions, but a func- 
tion of the Async*MethodBuilder types in the System.CompilerServices 
namespace that the compiler uses in expanding asynchronous functions. 


Exception posting 


It's common practice in rich-client applications to rely on the central exception han- 
dling event (Application.DispatcherUnhandledException in WPF) to process 
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unhandled exceptions thrown on the UI thread. And in ASP.NET Core applications, 
a custom ExceptionFilterAttribute in the ConfigureServices method of 
Startup.cs does a similar job. Internally, they work by invoking UI events (or in 
ASP.NET Core, the pipeline of page-processing methods) in their own try/catch 
block. 


Top-level asynchronous functions complicate this. Consider the following event 
handler for a button click: 


async void ButtonClick (object sender, RoutedEventArgs args) 


{ 
await Task.Delay(1000) ; 
throw new Exception ("Will this be ignored?"); 


} 
When the button is clicked and the event handler runs, execution returns normally 
to the message loop after the await statement, and the exception that’s thrown a sec- 
ond later cannot be caught by the catch block in the message loop. 


To mitigate this problem, AsyncVoidMethodBuilder catches unhandled exceptions 
(in void-returning asynchronous functions) and posts them to the synchronization 
context if present, ensuring that global exception-handling events still fire. 


The compiler applies this logic only to void-returning asyn- 
chronous functions. So, if we changed ButtonClick to return 
a Task instead of void, the unhandled exception would fault 
the resultant Task, which would then have nowhere to go 
(resulting in an unobserved exception). 


An interesting nuance is that it makes no difference whether you throw before or 
after an await. Thus, in the following example, the exception is posted to the syn- 
chronization context (if present) and never to the caller: 


async void Foo() { throw null; await Task.Delay(1000); } 


(If no synchronization context is present, the exception will propagate on the thread 
pool, which will terminate the application.) 


The reason for the exception not being thrown directly back to the caller is to 
ensure predictability and consistency. In the following example, the Invalid 
OperationException will always have the same effect of faulting the resultant 
Task—regardless of someCondi tion: 


async Task Foo() 


{ 
if (someCondition) await Task.Delay (100); 
throw new InvalidOperationException(); 


} 


Iterators work in a similar way: 


TIEnumerable<int> Foo() { throw null; yield return 123; } 
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In this example, an exception is never thrown straight back to the caller: not until 
the sequence is enumerated is the exception thrown. 


OperationStarted and OperationCompleted 


If a synchronization context is present, void-returning asynchronous functions also 
call its OperationStarted method upon entering the function, and its Operation 
Completed method when the function finishes 


Overriding these methods is useful if writing a custom synchronization context for 
unit testing void-returning asynchronous methods. This is discussed on Microsoft's 
Parallel Programming blog. 


Optimizations 


Completing synchronously 


An asynchronous function can return before awaiting. Consider the following 
method that caches the downloading of web pages: 


static Dictionary<string,string> _cache = new Dictionary<string,string>(); 


async Task<string> GetWebPageAsync (string uri) 


{ 
string html; 
if (_cache.TryGetValue (uri, out html)) return html; 
return _cache [uri] = 
await new WebClient().DownloadStringTaskAsync (uri); 


} 


Should a URI already exist in the cache, execution returns to the caller with no 
awaiting having occurred, and the method returns an already-signaled task. This is 
referred to as synchronous completion. 


When you await a synchronously completed task, execution does not return to the 
caller and bounce back via a continuation; instead, it proceeds immediately to the 
next statement. The compiler implements this optimization by checking the Is 
Completed property on the awaiter; in other words, whenever you await: 


Console.WriteLine (await GetWebPageAsync ("http://oreilly.com")); 


the compiler emits code to short-circuit the continuation in case of synchronization 
completion: 


var awaiter = GetWebPageAsync().GetAwaiter(); 
if (awaiter.IsCompleted) 
Console.WriteLine (awaiter.GetResult()); 
else 
awaiter.OnCompleted (() => Console.WriteLine (awaiter.GetResult()); 
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Awaiting an asynchronous function that returns synchro- 
nously still incurs a (very) small overhead—maybe 20 nano- 
seconds on a 2019-era PC. 


In contrast, bouncing to the thread pool introduces the cost of 
a context switch—perhaps one or two microseconds, and 
bouncing to a UI message loop, at least 10 times that (much 
longer if the UI thread is busy). 


It's even legal to write asynchronous methods that never await, although the com- 
piler will generate a warning: 


" 


async Task<string> Foo() { return "abc"; } 

Such methods can be useful when overriding virtual/abstract methods, if your 
implementation doesn’t happen to need asynchrony. (An example is MemoryStream’s 
ReadAsync/WriteAsync methods; see Chapter 15.) Another way to achieve the same 


result is to use Task. FromResult, which returns an already signaled task: 
Task<string> Foo() { return Task.FromResult ("abc"); } 


Our GetWebPageAsync method is implicitly thread-safe if called from a UI thread, in 
that you could invoke it several times in succession (thereby initiating multiple con- 
current downloads), and no locking is required to protect the cache. If the series of 
calls were to the same URI, though, wed end up initiating multiple redundant 
downloads, all of which would eventually update the same cache entry (the last one 
winning). Although not erroneous, it would be more efficient if subsequent calls to 
the same URI could instead (asynchronously) wait upon the result of the in- 
progress request. 


There’s an easy way to accomplish this—without resorting to locks or signaling con- 
structs. Instead of a cache of strings, we create a cache of “futures” (Task<string>): 


static Dictionary<string, Task<string>> _cache = 
new Dictionary<string, Task<string>>(); 


Task<string> GetWebPageAsync (string uri) 
{ 


if (_cache.TryGetValue (uri, out var downloadTask)) return downloadTask; 
return _cache [uri] = new WebClient().DownloadStringTaskAsync (uri); 


} 


(Notice that we don’t mark the method as async, because we're directly returning 
the task we obtain from calling WebClient’s method). 


If we call GetWebPageAsync repeatedly with the same URI, we're now guaranteed to 
get the same Task<string> object back. (This has the additional benefit of minimiz- 
ing garbage collection load.) And if the task is complete, awaiting it is cheap, thanks 
to the compiler optimization that we just discussed. 


We could further extend our example to make it thread-safe without the protection 
of a synchronization context, by locking around the entire method body: 
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lock (_cache) 
if (_cache.TryGetValue (uri, out var downloadTask) ) 
return downloadTask; 
else 
return _cache [uri] = new WebClient().DownloadStringTaskAsync (uri); 


} 


This works because we're not locking for the duration of downloading a page 
(which would hurt concurrency); we're locking for the small duration of checking 
the cache, starting a new task if necessary, and updating the cache with that task. 


ValueTask<T> 


We just described how the compiler optimizes an await expression on a synchro- 
nously completed task—by short-circuiting the continuation and proceeding imme- 
diately to the next statement. If the synchronous completion is due to caching, we 
saw that caching the task itself can provide an elegant and efficient solution. 


ValueTask<T> is intended for micro-optimization scenarios, 
and you might never need to write methods that return this 
type. However, it still pays to be aware of the precautions that 
we outline in the next section because some .NET Core meth- 
ods return ValueType<T>, and IAsyncEnumerable<T> makes 
use of it, too. 


It's not practical, however, to cache the task in all synchronous completion scenar- 
ios. Sometimes, a fresh task must be instantiated, and this creates a (tiny) potential 
inefficiency. This is because Task and Task<T> are reference types, and so instantia- 
tion requires a heap-based memory allocation and subsequent collection. An 
extreme form of optimization is to write code that’s allocation-free; in other words, 
that does not instantiate any reference types, adding no burden to garbage collec- 
tion. To support this pattern, the ValueTask and ValueTask<T> structs have been 
introduced, which the compiler allows in place of Task and Task<T>: 


async ValueTask<int> Foo() { ... } 
Awaiting ValueTask<T> is allocation-free, if the operation completes synchronously: 
int answer = await Foo(); // (Potentially) allocation-free 


If the operation doesn’t complete synchronously, ValueTask<T> creates an ordinary 
Task<T> behind the scenes (to which it forwards the await), and nothing is gained. 


You can convert a ValueTask<T> into an ordinary Task<T> by calling the AsTask 
method. 


There’s also a nongeneric version—ValueTask—which is akin to Task. 


Precautions when using ValueTask<T> 


ValueTask<T> is relatively unusual in that it’s defined as a struct purely for perfor- 
mance reasons. This means that it's encumbered with inappropriate value-type 
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semantics, which can lead to surprises. To avoid incorrect behavior, you must avoid 
the following: 


¢ Awaiting the same ValueTask<T> multiple times 


e Calling .GetAwaiter().GetResult() when the operation hasn't completed 


If you need to perform these actions, call .AsTask() and operate instead on the 
resulting Task. 


The easiest way to avoid these traps is to directly await a 
method call, for instance: 

await Foo(); // Safe 
The door to erroneous behavior opens when you assign the 
(value) task to a variable: 


ValueTask<int> valueTask = Foo(); // Caution! 
// Our use of valueTask can now lead to errors. 


which can be mitigated by converting immediately to an ordi- 
nary task: 


Task<int> task = Foo().AsTask(); // Safe 
// task is safe to work with. 


Avoiding excessive bouncing 


For methods that are called many times in a loop, you can avoid the cost of repeat- 
edly bouncing to a UI message loop by calling ConfigureAwait. This forces a task 
not to bounce continuations to the synchronization context, cutting the overhead 
closer to the cost of a context switch (or much less if the method that youre await- 
ing completes synchronously): 


async void A() { ... await B(); ... } 


async Task B() 


{ 
for (int i = 0; i < 1000; i++) 
await C().ConfigureAwait (false); 
} 


async Task C() { ... } 


This means that for the B and C methods, we rescind the simple thread-safety model 
in UI apps whereby code runs on the UI thread and can be preempted only during 
an await statement. Method A, however, is unaffected and will remain on a UI 
thread if it started on one. 


This optimization is particularly relevant when writing libraries: you don't need the 
benefit of simplified thread safety because your code typically does not share state 
with the caller—and does not access UI controls. (It would also make sense, in our 
example, for method C to complete synchronously if it knew the operation was 
likely to be short-running.) 
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Asynchronous Patterns 


Cancellation 


It's often important to be able to cancel a concurrent operation after it’s started, per- 
haps in response to a user request. A simple way to implement this is with a cancel- 
lation flag, which we could encapsulate by writing a class like this: 


class CancellationToken 


{ 
public bool IsCancellationRequested { get; private set; } 
public void Cancel() { IsCancellationRequested = true; } 
public void ThrowIfCancellationRequested() 
{ 
if (IsCancellationRequested) 
throw new OperationCanceledException(); 
} 
} 


We could then write a cancellable asynchronous method as follows: 


async Task Foo (CancellationToken cancellationToken) 
{ 

for (int i = 0; i < 10; i++) 

if 


Console.WriteLine (i); 
await Task.Delay (1000); 
canceLLationToken. ThrowIfCancellationRequested(); 
} 
} 


When the caller wants to cancel, it calls Cancel on the cancellation token that it 
passed into Foo. This sets IsCancellationRequested to true, which causes Foo to 
fault a short time later with an OperationCanceledException (a predefined excep- 
tion in the System namespace designed for this purpose). 


Thread safety aside (we should be locking around reading/writing IsCancellation 
Requested), this pattern is effective and the CLR provides a type called 
CancellationToken, which is very similar to what we've just shown. However, it 
lacks a Cancel method; this method is instead exposed on another type called Can 
cellationTokenSource. This separation provides some security: a method that has 
access only to a CancellationToken object can check for but not initiate 
cancellation. 


To get a cancellation token, we first instantiate a CancellationTokenSource: 
var cancelSource = new CancellationTokenSource(); 


This exposes a Token property, which returns a CancellationToken. Hence, we 
could call our Foo method as follows: 


var cancelSource = new CanceLlationTokenSource(); 
Task foo = Foo (cancelSource. Token); 
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... (some time later) 
cancelSource.Cancel(); 


Most asynchronous methods in the CLR support cancellation tokens, including 
Delay. If we modify Foo such that it passes its token into the Delay method, the task 
will end immediately upon request (rather than up to a second later): 


async Task Foo (CancellationToken cancellationToken) 


{ 
for (int i = 0; i < 10; i++) 
{ 
Console.WriteLine (i); 
await Task.Delay (1000, cancellationToken) ; 
} 
} 


Notice that we no longer need to call ThrowIfCancellationRequested, because 
Task.DeLay is doing that for us. Cancellation tokens propagate nicely down the call 
stack (just as cancellation requests cascade up the call stack, by virtue of being 
exceptions). 


Asynchronous methods in WinRT follow an inferior protocol 
for cancellation whereby instead of accepting a Cancellation 
Token, the IAsyncInfo type exposes a Cancel method. The 
AsTask extension method is overloaded to accept a cancella- 
tion token, however, bridging the gap. 


Synchronous methods can support cancellation, too (such as Task’s Wait method). 
In such cases, the instruction to cancel will need to come asynchronously (e.g., from 
another task); for example: 


var cancelSource = new CancellationTokenSource(); 
Task.Delay (5000).ContinueWith (ant => cancelSource.Cancel()); 


In fact, you can specify a time interval when constructing CancellationToken 
Source to initiate cancellation after a set period of time (just as we demonstrated). 
It's useful for implementing timeouts, whether synchronous or asynchronous: 


var cancelSource = new CancellationTokenSource (5000); 
try { await Foo (cancelSource.Token); } 
catch (OperationCanceledException ex) { Console.WriteLine ("Cancelled"); } 


The CancellationToken struct provides a Register method that lets you register a 
callback delegate that will be fired upon cancellation; it returns an object that can be 
disposed to undo the registration. 


Tasks generated by the compiler’s asynchronous functions automatically enter a 
Canceled state upon an unhandled OperationCanceledException (IsCanceled 
returns true and IsFaulted returns false). The same goes for tasks created with 
Task.Run for which you pass the (same) CancellationToken to the constructor. 
The distinction between a faulted and a canceled task is unimportant in asynchro- 
nous scenarios, in that both throw an OperationCanceledException when awaited; 
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it matters in advanced parallel programming scenarios (specifically conditional 
continuations). We pick up this topic in “Canceling Tasks” on page 949. 


Progress Reporting 


Sometimes, you'll want an asynchronous operation to report back progress as it’s 
running. A simple solution is to pass an Action delegate to the asynchronous 
method, which the method fires whenever progress changes: 


Task Foo (Action<int> onProgressPercentChanged) 


{ 


return Task.Run (() => 


{ 
for (int i = 0; i < 1000; i++) 
{ 
if (i % 10 == 0) onProgressPercentChanged (i / 10); 
// Do something compute-bound... 
} 
}); 
+ 


Here’s how we could call it: 


Action<int> progress = i => Console.WriteLine (i + " %"); 
await Foo (progress); 


Although this works well in a Console application, it’s not ideal in rich-client sce- 
narios because it reports progress from a worker thread, causing potential thread- 
safety issues for the consumer. (In effect, we've allowed a side effect of concurrency 
to leak to the outside world, which is unfortunate given that the method is other- 
wise isolated if called from a UI thread.) 


IProgress<T> and Progress<T> 


The CLR provides a pair of types to solve this problem: an interface called 
IProgress<T> and a class that implements this interface called Progress<T>. Their 
purpose, in effect, is to wrap a delegate so that UI applications can report progress 
safely through the synchronization context. 


The interface defines just one method: 


public interface IProgress<in T> 


{ 
void Report (T value); 


} 


Using IProgress<T> is easy: our method hardly changes: 


Task Foo (IProgress<int> onProgressPercentChanged) 


{ 
return Task.Run (() => 
{ 
for (int i = 0; i < 1000; i++) 
{ 


if (i % 10 == 0) onProgressPercentChanged.Report (i / 10); 
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// Do something compute-bound... 
} 
}); 
} 


The Progress<T> class has a constructor that accepts a delegate of type Action<T> 
that it wraps: 


var progress = new Progress<int> (i => Console.WriteLine (i + " %")); 
await Foo (progress); 


(Progress<T> also has a ProgressChanged event that you can subscribe to instead of 
[or in addition to] passing an action delegate to the constructor.) Upon instantiating 
Progress<int>, the class captures the synchronization context, if present. When 
Foo then calls Report, the delegate is invoked through that context. 


Asynchronous methods can implement more elaborate progress reporting by 
replacing int with a custom type that exposes a range of properties. 


If youre familiar with Reactive Framework, you'll notice that 
IProgress<T> together with the task returned by the asyn- 
chronous function provide a feature set similar to I0bserver 
<T>. The difference is that a task can expose a “final” return 
value in addition to (and differently typed to) the values emit- 
ted by IProgress<T>. 


Values emitted by IProgress<T> are typically “throwaway” 
values (e.g., percent complete or bytes downloaded so far), 
whereas values pushed by I0bserver<T>’s OnNext typically 
comprise the result itself and are the very reason for calling it. 


Asynchronous methods in WinRT also offer progress reporting, although the pro- 
tocol is complicated by COM'’s (relatively) retarded type system. Instead of accept- 
ing an IProgress<T> object, asynchronous WinRT methods that report progress 
return one of the following interfaces, in place of IAsyncAction and IAsync 
Operation<TResulLt>: 


TAsyncActionWithProgress<TProgress> 
TAsyncOperationWithProgress<TResult, TProgress> 


Interestingly, both are based on IAsyncInfo (not IAsyncAction and TAsync 
Operation<TResult>). 


The good news is that the AsTask extension method is also overloaded to accept 
IProgress<T> for the aforementioned interfaces, so as a .NET consumer, you can 
ignore the COM interfaces and do this: 


var progress = new Progress<int> (i => Console.WriteLine (i + " %")); 
CancellationToken cancelToken = ... 
var task = someWinRTobject.FooAsync().AsTask (cancelToken, progress); 
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The Task-Based Asynchronous Pattern 


-NET Core exposes hundreds of task-returning asynchronous methods that you can 
await (mainly related to I/O). Most of these methods (at least partly) follow a pat- 
tern called the Task-Based Asynchronous Pattern (TAP), which is a sensible formali- 
zation of what we have described to date. A TAP method does the following: 


e Returns a “hot” (running) Task or Task<TResult> 


e Has an “Async” suffix (except for special cases such as task combinators) 


Is overloaded to accept a cancellation token and/or IProgress<T> if it supports 
cancellation and/or progress reporting 


e Returns quickly to the caller (has only a small initial synchronous phase) 


¢ Does not tie up a thread if 1/O-bound 


As we've seen, TAP methods are easy to write with C#’s asynchronous functions. 


Task Combinators 


A nice consequence of there being a consistent protocol for asynchronous functions 
(whereby they consistently return tasks) is that it's possible to use and write task 
combinators—functions that usefully combine tasks, without regard for what those 
specific tasks do. 


The CLR includes two task combinators: Task.WhenAny and Task.WhenAll. In 
describing them, we'll assume the following methods are defined: 


async Task<int> Delay1() { await Task.Delay (1000); return 1; } 
async Task<int> Delay2() { await Task.Delay (2000); return 2; } 
async Task<int> Delay3() { await Task.Delay (3000); return 3; } 


WhenAny 


Task.WhenAny returns a task that completes when any one of a set of tasks complete. 
The following completes in one second: 


Task<int> winningTask = await Task.WhenAny (Delay1(), Delay2(), Delay3()); 
Console.WriteLine ("Done"); 
Console.WriteLine (winningTask.Result) ; // 1 


Because Task.WhenAny itself returns a task, we await it, which returns the task that 
finished first. Our example is entirely nonblocking—including the last line when we 
access the Result property (because winningTask will already have finished). None- 
theless, it’s usually better to await the winningTask: 


Console.WriteLine (await winningTask); // 1 


because any exceptions are then rethrown without an AggregateException wrap- 
ping. In fact, we can perform both awaits in one step: 


int answer = await await Task.WhenAny (Delay1(), Delay2(), Delay3()); 
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If a nonwinning task subsequently faults, the exception will go unobserved unless 
you subsequently await the task (or query its Exception property). 


WhenAny is useful for applying timeouts or cancellation to operations that don't 
otherwise support it: 


Task<string> task = SomeAsyncFunc(); 

Task winner = await (Task.WhenAny (task, Task.Delay(5000))); 
if (winner != task) throw new TimeoutException(); 

string result = await task; // Unwrap result/re-throw 


Notice that because in this case we're calling WhenAny with differently typed tasks, 
the winner is reported as a plain Task (rather than a Task<string>). 


WhenAll 


Task.WhenALl returns a task that completes when all of the tasks that you pass to it 
complete. The following completes after three seconds (and demonstrates the fork/ 
join pattern): 


await Task.WhenALL (Delay1(), Delay2(), Delay3()); 


We could get a similar result by awaiting task1, task2, and task3 in turn rather 
than using WhenALL1: 


Task task1 = Delay1(), task2 = Delay2(), task3 = Delay3(); 
await taski; await task2; await task3; 


The difference (apart from it being less efficient by virtue of requiring three awaits 
rather than one) is that, should task1 fault, we'll never get to await task2/task3, 
and any of their exceptions will go unobserved. 


In contrast, Task.WhenALl doesn’t complete until all tasks have completed—even 
when there's a fault. And if there are multiple faults, their exceptions are combined 
into the tasks AggregateException (this is when AggregateException actually 
becomes useful—should you be interested in all the exceptions, that is). Awaiting 
the combined task, however, throws only the first exception, so to see all the excep- 
tions you need to do this: 


Task task1 = Task.Run (() => { throw null; } ); 
Task task2 = Task.Run (() => { throw null; } ); 
Task all = Task.WhenALl (task1, task2); 
try { await all; } 
catch 
{ 
Console.WriteLine (all.Exception.InnerExceptions.Count) ; // 2 


} 


Calling WhenAlLl with tasks of type Task<TResult> returns a Task<TResult[ ]>, giv- 
ing the combined results of all the tasks. This reduces to a TResuLt[ ] when awaited: 


Task<int> task1 = Task.Run (() => 1); 
Task<int> task2 = Task.Run (() => 2); 
int[] results = await Task.WhenALL (task1, task2); // { 1, 2 } 
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To give a practical example, the following downloads URIs in parallel and sums 
their total length: 


async Task<int> GetTotalSize (string[] uris) 


{ 
TEnumerable<Task<byte[ ]>> downloadTasks = uris.Select (uri => 
new WebClient().DownloadDataTaskAsync (uri)); 


byte[][] contents = await Task.WhenALL (downloadTasks) ; 
return contents.Sum (c => c.Length); 


} 


There's a slight inefficiency here, though, in that we're unnecessarily hanging on to 
the byte arrays that we download until every task is complete. It would be more effi- 
cient if we collapsed byte arrays into their lengths immediately after downloading 
them. This is where an asynchronous lambda comes in handy because we need to 
feed an await expression into LINQ’s Select query operator: 


async Task<int> GetTotalSize (string[] uris) 


{ 


TEnumerable<Task<int>> downloadTasks = uris.Select (asyne uri => 
(await new WebClient().DownloadDataTaskAsync (uri) ).Length) ; 


int[] contentLengths = await Task.WhenALl (downloadTasks) ; 
return contentLengths.Sum(); 


} 


Custom combinators 


It can be useful to write your own task combinators. The simplest “combinator” 
accepts a single task, such as the following, which lets you await any task with a 
timeout: 


async static Task<TResult> WithTimeout<TResult> (this Task<TResult> task, 
TimeSpan timeout) 


{ 
Task winner = await Task.WhenAny (task, Task.Delay (timeout) ) 
.ConfigureAwait (false); 
if (winner != task) throw new TimeoutException(); 
return await task.ConfigureAwait (false); // Unwrap result/re-throw 


} 


Because this is very much a “library method” that doesn’t access external shared 
state, we use ConfigureAwait(false) when awaiting to avoid potentially bouncing 
to a UI synchronization context. We can further improve efficiency by canceling the 
Task.Delay when the task completes on time (this avoids the small overhead of a 
timer hanging around): 


async static Task<TResult> WithTimeout<TResult> (this Task<TResult> task, 
TimeSpan timeout) 
{ 
var cancelSource = new CancellationTokenSource(); 
var delay = Task.Delay (timeout, cancelSource. Token) ; 
Task winner = await Task.WhenAny (task, delay).ConfigureAwait (false); 
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if (winner == task) 
cancelSource.Cancel(); 
else 
throw new TimeoutException(); 
return await task.ConfigureAwait (false); // Unwrap result/re-throw 


} 


The following lets you “abandon” a task via a Cancel LationToken: 


static Task<TResult> WithCancellation<TResuLlt> (this Task<TResult> task, 
CancellationToken cancelToken) 
{ 
var tcs = new TaskCompletionSource<TResult>(); 
var reg = cancelToken.Register (() => tcs.TrySetCanceled ()); 
task.ContinueWith (ant => 
{ 
reg.Dispose(); 
if (ant.IsCanceled) 
tcs.TrySetCanceled(); 
else if (ant.IsFaulted) 
tcs.TrySetException (ant.Exception.InnerException) ; 
else 
tcs.TrySetResult (ant.Result); 
}) 
return tcs.Task; 


} 


Task combinators can be complex to write, sometimes requiring the use of signaling 
constructs, which we cover in Chapter 22. This is actually a good thing, because it 
keeps concurrency-related complexity out of your business logic and into reusable 
methods that can be tested in isolation. 


The next combinator works like WhenALl, except that if any of the tasks fault, the 
resultant task faults immediately: 


async Task<TResult[]> WhenALLOrError<TResult> 
(params Task<TResult>[] tasks) 
{ 
var killJoy = new TaskCompletionSource<TResult[ ]>(); 
foreach (var task in tasks) 
task.ContinueWith (ant => 
{ 
if (ant.IsCanceled) 
killJoy.TrySetCanceled(); 
else if (ant.IsFaulted) 
killJoy.TrySetException (ant.Exception. InnerException) ; 
}); 
return await await Task.WhenAny (killJoy.Task, Task.WhenALl (tasks) ) 
.ConfigureAwait (false); 
} 


We begin by creating a TaskCompletionSource whose sole job is to end the party if 
a task faults. Hence, we never call its SetResult method; only its TrySetCanceled 
and TrySetException methods. In this case, ContinueWith is more convenient than 
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GetAwaiter().OnCompleted because we're not accessing the tasks’ results and 
wouldn't want to bounce to a UI thread at that point. 


Asynchronous Locking 


In “Asynchronous semaphores and locks” on page 897 in Chapter 22, we describe 
how to use SemaphoreS lin to lock or limit concurrency asynchronously. 


Obsolete Patterns 


.NET employs other patterns for asynchrony, which precede tasks and asynchro- 
nous functions. These are rarely required now that task-based asynchrony has 
become the dominant pattern. 


Asynchronous Programming Model 


The oldest pattern is called the Asynchronous Programming Model (APM) and 
uses a pair of methods starting in “Begin” and “End,” and an interface called IAsync 
Result. To illustrate, let’s take the Stream class in System.1I0 and look at its Read 
method. First, the synchronous version: 


public int Read (byte[] buffer, int offset, int size); 
You can probably predict what the task-based asynchronous version looks like: 
public Task<int> ReadAsync (byte[] buffer, int offset, int size); 
Now let’s examine the APM version: 


public IAsyncResult BeginRead (byte[] buffer, int offset, int size, 
AsyncCallback callback, object state); 
public int EndRead (IAsyncResult asyncResult); 


Calling the Begin* method initiates the operation, returning an IAsyncResult 
object, which acts as a token for the asynchronous operation. When the operation 
completes (or faults), the AsyncCallback delegate fires: 


public delegate void AsyncCallback (IAsyncResult ar); 


Whoever handles this delegate then calls the End* method, which provides the oper- 
ation’s return value as well as rethrowing an exception if the operation faulted. 


The APM is not only awkward to use, but also surprisingly difficult to implement 
correctly. The easiest way to deal with APM methods is to call the 
Task.Factory.FromAsync adapter method, which converts an APM method pair 
into a Task. Internally, it uses a TaskCompletionSource to give you a task that’s sig- 
naled when an APM operation completes or faults. 


The FromAsync method requires the following parameters: 


e A delegate specifying a BeginXxXxX method 
e A delegate specifying an EndXxXxX method 
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¢ Additional arguments that will get passed to these methods 


FromAsync is overloaded to accept delegate types and arguments that match nearly 
all the asynchronous method signatures found in .NET Core. For instance, assum- 
ing stream is a Stream and buffer is a byte[], we could do this: 


Task<int> readChunk = Task<int>.Factory.FromAsync ( 
stream.BeginRead, stream.EndRead, buffer, 0, 1000, null); 


Event-Based Asynchronous Pattern 


The Event-Based Asynchronous Pattern (EAP) was introduced in Framework 2.0 to 
provide a simpler alternative to the APM, particularly in UI scenarios. It was imple- 
mented in only a handful of types, however, most notably WebClient in 
System.Net. The EAP is just a pattern; no types are provided to assist. Essentially 
the pattern is this: a class offers a family of members that internally manage concur- 
rency, similar to the following: 


// These members are from the WebClient class: 


public byte[] DownloadData (Uri address); // Synchronous version 
public void DownloadDataAsync (Uri address); 

public void DownloadDataAsync (Uri address, object userToken) ; 

public event DownloadDataCompletedEventHandler DownloadDataCompleted ; 


public void CancelAsync (object userState); // Cancels an operation 
public bool IsBusy { get; } // Indicates if still running 


The *Async methods initiate an operation asynchronously. When the operation 
completes, the *Completed event fires (automatically posting to the captured syn- 
chronization context if present). This event passes back an event arguments object 
that contains the following: 


e A flag indicating whether the operation was canceled (by the consumer calling 
CancelAsync) 
e An Error object indicating an exception that was thrown (if any) 
e The userToken object if supplied when calling the Async method 
EAP types can also expose a progress reporting event, which fires whenever pro- 
gress changes (also posted through the synchronization context): 
public event DownloadProgressChangedEventHandler DownloadProgressChanged; 


Implementing the EAP requires a large amount of boilerplate code, making the pat- 
tern poorly compositional. 


BackgroundWorker 


BackgroundWorker in System.ComponentModel is a general-purpose implementa- 
tion of the EAP. It allows rich-client apps to start a worker thread and report 
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completion and percentage-based progress without needing to explicitly capture 
synchronization context. Here’s an example: 


var worker = new BackgroundWorker { WorkerSupportsCancellation = true }; 
worker.DoWork += (sender, args) => 
{ // This runs on a worker thread 
if (args.Cancel) return; 
Thread.Sleep(1000) ; 
args.Result = 123; 


3; 
worker .RunWorkerCompleted += (sender, args) => 
{ // Runs on UI thread 
// We can safely update UI controls here... 
if (args.Cancelled) 
Console.WriteLine ("Cancelled"); 
else if (args.Error != null) 
Console.WriteLine ("Error: " + args.Error.Message) ; 
else 
Console.WriteLine ("Result is: " + args.Result); 
33 


worker .RunWorkerAsync(); // Captures sync context and starts operation 


RunWorkerAsync starts the operation, firing the DoWork event on a pooled worker 
thread. It also captures the synchronization context, and when the operation com- 
pletes (or faults), the RunWorkerCompleted event is invoked through that synchroni- 
zation context (like a continuation). 


BackgroundWorker creates coarse-grained concurrency, in that the DoWork event 
runs entirely on a worker thread. If you need to update UI controls in that event 
handler (other than posting a percentage-complete message), you must use 
Dispatcher .BeginInvoke or similar. 


We describe BackgroundWorker in more detail online. 
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15 


Streams and I/0 








This chapter describes the fundamental types for input and output in .NET, with 
emphasis on the following topics: 


¢ The .NET stream architecture and how it provides a consistent programming 
interface for reading and writing across a variety of I/O types 

¢ Classes for manipulating files and directories on disk 

¢ Specialized streams for compression, named pipes, and memory-mapped files. 


This chapter concentrates on the types in the System.10 namespace, the home of 
lower-level I/O functionality. 


Stream Architecture 


The .NET stream architecture centers on three concepts: backing stores, decorators, 
and adapters, as shown in Figure 15-1. 


A backing store is the endpoint that makes input and output useful, such as a file or 
network connection. Precisely, it is either or both of the following: 


e A source from which bytes can be sequentially read 


¢ A destination to which bytes can be sequentially written 


A backing store is of no use, though, unless exposed to the programmer. A Stream is 
the standard .NET class for this purpose; it exposes a standard set of methods for 
reading, writing, and positioning. Unlike an array, for which all the backing data 
exists in memory at once, a stream deals with data serially—either one byte at a time 
or in blocks of a manageable size. Hence, a stream can use a small, fixed amount of 
memory regardless of the size of its backing store. 
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Figure 15-1. Stream architecture 


Streams fall into two categories: 


Backing store streams 
These are hardwired to a particular type of backing store, such as FileStream 
or NetworkStream. 


Decorator streams 
These feed off another stream, transforming the data in some way, such as 
DeflateStream or CryptoStream. 


Decorator streams have the following architectural benefits: 


e They liberate backing store streams from needing to implement such features 
as compression and encryption themselves. 


e Streams don't suffer a change of interface when decorated. 
e You connect decorators at runtime. 


e You can chain decorators together (e.g., a compressor followed by an 
encryptor). 


Both backing store and decorator streams deal exclusively in bytes. Although this is 
flexible and efficient, applications often work at higher levels such as text or XML. 
Adapters bridge this gap by wrapping a stream in a class with specialized methods 
typed to a particular format. For example, a text reader exposes a ReadLine method; 
an XML writer exposes a WriteAttributes method. 
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An adapter wraps a stream, just like a decorator. Unlike a dec- 
orator, however, an adapter is not itself a stream; it typically 
hides the byte-oriented methods completely. 


To summarize, backing store streams provide the raw data; decorator streams pro- 
vide transparent binary transformations such as encryption; adapters offer typed 
methods for dealing in higher-level types such as strings and XML. Figure 15-1 
illustrates their associations. To compose a chain, you simply pass one object into 
another’s constructor. 


Using Streams 


The abstract Stream class is the base for all streams. It defines methods and proper- 
ties for three fundamental operations: reading, writing, and seeking, as well as for 
administrative tasks such as closing, flushing, and configuring timeouts (see 
Table 15-1). 


Table 15-1. Stream class members 


Category | Members 


Reading public abstract bool CanRead { get; } 
public abstract int Read (byte[] buffer, int offset, int count) 
public virtual int ReadByte(); 

Writing public abstract bool CanWrite { get; } 


public abstract void Write (byte[] buffer, int offset, int 
count); 


public virtual void WriteByte (byte value); 
Seeking public abstract bool CanSeek { get; } 
public abstract long Position { get; set; } 
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public abstract void SetLength (long value); 





public abstract long Length { get; } 
public abstract long Seek (long offset, SeekOrigin origin); 


Closing/ public virtual void Close(); 
flushing 


public void Dispose(); 
public abstract void Flush(); 
Timeouts public virtual bool CanTimeout { get; } 
public virtual int ReadTimeout { get; set; } 
public virtual int WriteTimeout { get; set; } 
Other public static readonly Stream Null; // "Null" stream 


public static Stream Synchronized (Stream stream); 
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There are also asynchronous versions of the Read and Write methods, both of 
which return Tasks and optionally accept a cancellation token, and overloads that 
work with Span<T> and Memory<T> types that we describe in Chapter 24. 


In the following example, we use a file stream to read, write, and seek: 


using System; 
using System.10; 


class Program 


{ 
static void Main() 
{ 
// Create a file called test.txt in the current directory: 
using (Stream s = new FileStream ("test.txt", FileMode.Create) ) 
{ 
Console.WriteLine (s.CanRead); // True 
Console.WriteLine (s.CanWrite); // True 
Console.WriteLine (s.CanSeek); // True 
s.WriteByte (101); 
s.WriteByte (102); 
byte[] block = { 1, 2, 3, 4, 5 }; 
s.Write (block, 0, block.Length); // Write block of 5 bytes 
Console.WriteLine (s.Length); // 7 
Console.WriteLine (s.Position); // 7 
s.Position = 0; // Move back to the start 
Console.WriteLine (s.ReadByte()); // 101 
Console.WriteLine (s.ReadByte()); // 102 
// Read from the stream back into the block array: 
Console.WriteLine (s.Read (block, 0, block.Length)); // 5 
// Assuming the Last Read returned 5, we'll be at 
// the end of the file, so Read will now return 0: 
Console.WriteLine (s.Read (block, 0, block.Length)); // 9 
} 
} 
} 


Reading or writing asynchronously is simply a question of calling ReadAsync/ 
WriteAsync instead of Read/Write, and awaiting the expression (we must also add 
the async keyword to the calling method, as we described in Chapter 14): 


async static void AsyncDemo() 


{ 


using (Stream s = new FileStream ("test.txt", FileMode.Create) ) 


{ 
byte[] block = { 1, 2, 3, 4, 5 }; 
await s.WriteAsync (block, 0, block.Length); // Write asychronously 


s.Position = 0; // Move back to the start 
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// Read from the stream back into the block array: 
Console.WriteLine (await s.ReadAsync (block, 0, block.Length)); // 5 


i 
I 
The asynchronous methods make it easy to write responsive and scalable applica- 
tions that work with potentially slow streams (particularly network streams), 
without tying up a thread. 


For the sake of brevity, we'll continue to use synchronous 
methods for most of the examples in this chapter; however, we 
recommend the asynchronous Read/Write operations as pref- 
erable in most scenarios involving network I/O. 


Reading and Writing 


A stream can support reading, writing, or both. If CanWrite returns false, the 
stream is read-only; if CanRead returns false, the stream is write-only. 


Read receives a block of data from the stream into an array. It returns the number of 
bytes received, which is always either less than or equal to the count argument. If it’s 
less than count, it means that either the end of the stream has been reached or the 
stream is giving you the data in smaller chunks (as is often the case with network 
streams). In either case, the balance of bytes in the array will remain unwritten, their 
previous values preserved. 


With Read, you can be certain you've reached the end of the 
stream only when the method returns 0. So, if you have a 
1,000-byte stream, the following code might fail to read it all 
into memory: 

// Assuming s is a stream: 

byte[] data = new byte [1000]; 

s.Read (data, 0, data.Length); 
The Read method could read anywhere from 1 to 1,000 bytes, 
leaving the balance of the stream unread. 
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Here’s the correct way to read a 1,000-byte stream: 


byte[] data = new byte [1000]; 


// bytesRead will always end up at 1000, unless the stream is 
// itself smaller in Length: 


int bytesRead = 0; 
int chunkSize = 1; 
while (bytesRead < data.Length && chunkSize > 0) 
bytesRead += 
chunkSize = s.Read (data, bytesRead, data.Length - bytesRead); 
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Fortunately, the BinaryReader type provides a simpler way to 
achieve the same result: 

byte[] data = new BinaryReader (s).ReadBytes (1000); 
If the stream is less than 1,000 bytes long, the byte array 
returned reflects the actual stream size. If the stream is seeka- 
ble, you can read its entire contents by replacing 1000 with 
(int)s.Length. 
We describe the BinaryReader type further in “Stream Adapt- 
ers” on page 653. 


The ReadByte method is simpler: it reads just a single byte, returning —1 to indicate 
the end of the stream. ReadByte actually returns an int rather than a byte because 
the latter cannot return —1. 


The Write and WriteByte methods send data to the stream. If they are unable to 
send the specified bytes, an exception is thrown. 


In the Read and Write methods, the offset argument refers to 
the index in the buffer array at which reading or writing 
begins, not the position within the stream. 


Seeking 


A stream is seekable if CanSeek returns true. With a seekable stream (such as a file 
stream), you can query or modify its Length (by calling SetLength) and at any time 
change the Position at which you're reading or writing. The Position property is 
relative to the beginning of the stream; the Seek method, however, allows you to 
move relative to the current position or the end of the stream. 


Changing the Position on a FileStream typically takes a few 
microseconds. If you're doing this millions of times in a loop, 
the MemoryMappedFile class might be a better choice than a 
FileStream (see “Memory-Mapped Files” on page 683). 


With a nonseekable stream (such as an encryption stream), the only way to deter- 
mine its length is to read it completely through. Furthermore, if you need to reread 
a previous section, you must close the stream and start afresh with a new one. 


Closing and Flushing 


Streams must be disposed after use to release underlying resources such as file and 
socket handles. A simple way to guarantee this is by instantiating streams within 
using blocks. In general, streams follow standard disposal semantics: 

e Dispose and Close are identical in function. 


¢ Disposing or closing a stream repeatedly causes no error. 
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Closing a decorator stream closes both the decorator and its backing store stream. 
With a chain of decorators, closing the outermost decorator (at the head of the 
chain) closes the whole lot. 


Some streams internally buffer data to and from the backing store to lessen round- 
tripping and so improve performance (file streams are a good example of this). This 
means that data you write to a stream might not hit the backing store immediately; 
it can be delayed as the buffer fills up. The Flush method forces any internally buf- 
fered data to be written immediately. Flush is called automatically when a stream is 
closed, so you never need to do the following: 


s.Flush(); s.Close(); 


Timeouts 


A stream supports read and write timeouts if CanTimeout returns true. Network 
streams support timeouts; file and memory streams do not. For streams that sup- 
port timeouts, the ReadTimeout and WriteTimeout properties determine the desired 
timeout in milliseconds, where 0 means no timeout. The Read and Write methods 
indicate that a timeout has occurred by throwing an exception. 


The asynchronous ReadAsync/WriteAsync methods do not support timeouts; 
instead you can pass a cancellation token into these methods. 


Thread Safety 


As a rule, streams are not thread-safe, meaning that two threads cannot concur- 
rently read or write to the same stream without possible error. The Stream class 
offers a simple workaround via the static Synchronized method. This method 
accepts a stream of any type and returns a thread-safe wrapper. The wrapper works 
by obtaining an exclusive lock around each read, write, or seek, ensuring that only 
one thread can perform such an operation at a time. In practice, this allows multiple 
threads to simultaneously append data to the same stream—other kinds of activities 
(such as concurrent reading) require additional locking to ensure that each thread 
accesses the desired portion of the stream. We discuss thread safety fully in 
Chapter 22. 


Backing Store Streams 


Figure 15-2 shows the key backing store streams provided by.NET Core. A “null 
stream” is also available via the Stream’s static Null field. Null streams can be useful 
when writing unit tests. 


In the following sections, we describe FileStream and MemoryStrean; in the final 
section in this chapter, we describe IsolatedStorageStream. In Chapter 16, we 
cover NetworkStream. 
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Figure 15-2. Backing store streams 


FileStream 


Earlier in this section, we demonstrated the basic use of a FileStream to read and 
write bytes of data. Let’s now examine the special features of this class. 


If youre using UWP, file I/O is best done with the Windows 
Runtime types in Windows. Storage (see “File I/O in UWP” on 
page 676). 


Constructing a FileStream 


The simplest way to instantiate a FileStream is to use one of the following static 
facade methods on the File class: 


FileStream fs1 = File.OpenRead ("readme.bin"); // Read-only 
FileStream fs2 = File.OpenWrite ("writeme.tmp"); // Write-only 
FileStream fs3 = File.Create ("readwrite.tmp"); // Read/write 


OpenWrite and Create differ in behavior if the file already exists. Create truncates 
any existing content; OpenWrite leaves existing content intact with the stream posi- 
tioned at zero. If you write fewer bytes than were previously in the file, OpenWrite 
leaves you with a mixture of old and new content. 


You can also directly instantiate a FileStream. Its constructors provide access to 
every feature, allowing you to specify a filename or low-level file handle, file 
creation and access modes, and options for sharing, buffering, and security. The fol- 
lowing opens an existing file for read/write access without overwriting it (the using 
keyword ensures it is disposed when fs exits scope): 


using var fs = new FileStream ("readwrite.tmp", FileMode.Open); 


We look closer at Fi teMode shortly. 
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Shortcut Methods on the File Class 


The following static methods read an entire file into memory in one step: 


e File.ReadAllText (returns a string) 
e File.ReadALlLines (returns an array of strings) 


« File.ReadAllBytes (returns a byte array) 
The following static methods write an entire file in one step: 


e File.WriteALlText 

e File.WriteALlLines 

e File.WriteALlBytes 

¢ File.AppendAllText (great for appending to a log file) 
There’s also a static method called File.ReadLines: this is like ReadAllLines except 
that it returns a lazily evaluated IEnumerable<string>. This is more efficient because 
it doesn't load the entire file into memory at once. LINQ is ideal for consuming the 


results: the following calculates the number of lines greater than 80 characters in 
length: 


int longLines = File.ReadLines ("filePath") 


-Count (l => L.Length > 80); 











Specifying a filename 


A filename can be either absolute (e.g., c:\temp\test.txt—or in Unix, /tmp/test.txt) or 
relative to the current directory (e.g., test.txt or temp\test.txt). You can access or 
change the current directory via the static Environment.CurrentDirectory 
property. 

When a program starts, the current directory might or might 

not coincide with that of the program’s executable. For this 

reason, you should never rely on the current directory for 

locating additional runtime files packaged along with your 

executable. 


AppDomain.CurrentDomain.BaseDirectory returns the application base directory, 
which in normal cases is the folder containing the program’s executable. To specify a 
filename relative to this directory, you can call Path. Combine: 


string baseFolder = AppDomain.CurrentDomain.BaseDirectory; 
string logoPath = Path.Combine (baseFolder, "logo. jpg"); 
Console.WriteLine (File.Exists (logoPath)); 
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You can read and write across a Windows network via a Universal Naming Conven- 
tion (UNC) path, such as \\JoesPC\PicShare\pic.jpg or \\10.1.1.2\PicShare\pic.jpg. 
(To access a Windows file share from macOS or Unix, mount it to your file system 
following instructions specific to your OS, and then open it using an ordinary path 
from C#). 


Specifying a FileMode 


All of FileStream’s constructors that accept a filename also require a FileMode 
enum argument. Figure 15-3 shows how to choose a FileMode, and the choices yield 
results akin to calling a static method on the File class. 





Requirement 





Read/Write Read Only 
Does file 
already exist? Eeeet ) 






a 
Truncate Unsure 
existing file? No* 
Yes, 





No What if there is 
an existing file? 
FileMode.Truncate | | FileMode.Open| | FileMode.CreateNew Append tot 
*An exception is thrown if you're wrong pat x onl 
Truncate it Letit be 
No 


FileMode.Create 





FileMode.OpenOrCreate 


=File.Create( ) 











Figure 15-3. Choosing a FileMode 


File.Create and FileMode.Create will throw an exception if 
used on hidden files. To overwrite a hidden file, you must 
delete and re-create it: 


File.Delete ("hidden.txt"); 
using var file = File.Create ("hidden.txt"); 


Constructing a FileStream with just a filename and FileMode gives you (with just 
one exception) a readable writable stream. You can request a downgrade if you also 
supply a FileAccess argument: 


[Flags] 
public enum FileAccess { Read = 1, Write = 2, ReadWrite = 3 } 


The following returns a read-only stream, equivalent to calling File.OpenRead: 
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using var fs = new FileStream ("x.bin", FileMode.Open, FileAccess.Read); 


FileMode.Append is the odd one out: with this mode, you get a write-only stream. 
To append with read-write support, you must instead use FileMode.Open or File 
Mode.OpenOrCreate and then seek the end of the stream: 


using var fs = new FileStream ("myFile.bin", FileMode.Open); 


fs.Seek (0, SeekOrigin.End); 


Advanced FileStream features 


Here are other optional arguments you can include when constructing a File 
Stream: 


e A FileShare enum describing how much access to grant other processes want- 
ing to dip into the same file before you've finished (None, Read [default], Read 
Write, or Write). 


¢ The size, in bytes, of the internal buffer (default is currently 4 KB). 


e A flag indicating whether to defer to the operating system for asynchronous 
1/0. 


e A FileOptions flags enum for requesting operating system encryption 
(Encrypted), automatic deletion upon closure for temporary files (DeleteOn 
Close), and optimization hints (RandomAccess and SequentialScan). There is 
also a WriteThrough flag that requests that the OS disable write-behind cach- 
ing; this is for transactional files or logs. Flags not supported by the underlying 
OS are silently ignored. 


Opening a file with FileShare.ReadWrite allows other processes or users to simul- 
taneously read and write to the same file. To avoid chaos, you can all agree to lock 
specified portions of the file before reading or writing, using these methods: 


// Defined on the FileStream class: 
public virtual void Lock (long position, long length); 
public virtual void Unlock (long position, long length); 


Lock throws an exception if part or all of the requested file section has already been 
locked. 


MemoryStream 


MemoryStream uses an array as a backing store. This partly defeats the purpose of 
having a stream because the entire backing store must reside in memory at once. 
MemoryStrean is still useful when you need random access to a nonseekable stream. 
If you know the source stream will be of a manageable size, you can copy it into a 
MemoryStream as follows: 
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var ms = new MemoryStream(); 
sourceStream.CopyTo (ms); 


You can convert a MemoryStream to a byte array by calling ToArray. The GetBuffer 
method does the same job more efficiently by returning a direct reference to the 
underlying storage array; unfortunately, this array is usually longer than the stream’s 
real length. 


Closing and flushing a MemoryStream is optional. If you close 
a MemoryStreanm, you can no longer read or write to it, but you 
are still permitted to call ToArray to obtain the underlying 
data. Flush does absolutely nothing on a memory stream. 


You can find further MemoryStream examples in “Compression Streams” on page 
661 and in “Encrypting in Memory” on page 872 in Chapter 21. 


PipeStream 


PipeStream provides a simple means by which one process can communicate with 
another through the operating system's pipes protocol. There are two kinds of pipe: 


Anonymous pipe (faster) 
Allows one-way communication between a parent and child process on the 
same computer. 


Named pipe (more flexible) 
Allows two-way communication between arbitrary processes on the same com- 
puter or different computers across a network. 


A pipe is good for interprocess communication (IPC) on a single computer: it 
doesn’t rely on a network transport, which means no network protocol overhead, 
and it has no issues with firewalls. 


Pipes are stream-based, so one process waits to receive a series 
of bytes while another process sends them. An alternative is 
for processes to communicate via a block of shared memory; 
we describe how to do this in “Memory-Mapped Files” on 
page 683. 


PipeStream is an abstract class with four concrete subtypes. Two are used for 
anonymous pipes and the other two for named pipes: 


Anonymous pipes 
AnonymousPipeServer Stream and AnonymousPipeClientStream 


Named pipes 
NamedPipeServerStream and NamedPipeClientStream 


Named pipes are simpler to use, so we describe them first. 
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Named pipes 


With named pipes, the parties communicate through a pipe of the same name. The 
protocol defines two distinct roles: the client and server. Communication happens 
between the client and server as follows: 


e The server instantiates a NamedPipeServerStream and then calls WaitFor 
Connection. 


e The client instantiates a NamedPipeClientStream and then calls Connect (with 
an optional timeout). 


The two parties then read and write the streams to communicate. 


The following example demonstrates a server that sends a single byte (100) and then 
waits to receive a single byte: 


using var s = new NamedPipeServerStream ("pipedream"); 


s.WaitForConnection(); 
s.WriteByte (100); // Send the value 100. 
Console.WriteLine (s.ReadByte()); 


Here's the corresponding client code: 


using var s = new NamedPipeClientStream ("pipedream"); 


s.Connect(); 
Console.WriteLine (s.ReadByte()); 
s.WriteByte (200); // Send the value 200 back. 


Named pipe streams are bidirectional by default, so either party can read or write 
their stream. This means that the client and server must agree on some protocol to 
coordinate their actions, so both parties don't end up sending or receiving at once. 


There also needs to be agreement on the length of each transmission. Our example 
was trivial in this regard, because we bounced just a single byte in each direction. To 
help with messages longer than one byte, pipes provide a message transmission 
mode (Windows only). If this is enabled, a party calling Read can know when a mes- 
sage is complete by checking the IsMessageComplete property. To demonstrate, we 
begin by writing a helper method that reads a whole message from a message- 
enabled PipeStream—in other words, reads until IsMessageComplete is true: 


static byte[] ReadMessage (PipeStream s) 
{ 


MemoryStream ms = new MemoryStream(); 
byte[] buffer = new byte [0x1000]; // Read in 4 KB blocks 


do { ms.Write (buffer, 0, s.Read (buffer, 0, buffer.Length)); } 
while (!s.IsMessageComplete) ; 


return ms.ToArray(); 
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(To make this asynchronous, replace “s.Read” with “await s.ReadAsync”.) 


You cannot determine whether a PipeStream has finished 
reading a message simply by waiting for Read to return 0. This 
is because, unlike most other stream types, pipe streams and 
network streams have no definite end. Instead, they temporar- 
ily “dry up” between message transmissions. 


Now we can activate message transmission mode. On the server, this is done by 
specifying PipeTransmissionMode.Message when constructing the stream: 


using var s = new NamedPipeServerStream ("pipedream", PipeDirection.InOut, 
1, PipeTransmissionMode.Message) ; 


s.WaitForConnection(); 


byte[] msg = Encoding.UTF8.GetBytes ("Hello"); 
s.Write (msg, 0, msg.Length); 


Console.WriteLine (Encoding.UTF8.GetString (ReadMessage (s))); 


On the client, we activate message transmission mode by setting ReadMode after call- 
ing Connect: 


using var s = new NamedPipeClientStream ("pipedream"); 


s.Connect(); 
s.ReadMode = PipeTransmissionMode.Message; 


Console.WriteLine (Encoding.UTF8.GetString (ReadMessage (s))); 


byte[] msg = Encoding.UTF8.GetBytes ("Hello right back!"); 
s.Write (msg, 0, msg.Length); 


Message mode is supported only on Windows. Other plat- 
forms throw PlatformNotSupportedException. 


Anonymous pipes 


An anonymous pipe provides a one-way communication stream between a parent 
and child process. Instead of using a system-wide name, anonymous pipes tune in 
through a private handle. 


As with named pipes, there are distinct client and server roles. The system of com- 
munication is a little different, however, and proceeds as follows: 


I. 


2. 


The server instantiates an AnonymousPipeServerStream, committing to a Pipe 
Direction of In or Out. 
The server calls GetClientHandleAsString to obtain an identifier for the pipe, 


which it then passes to the client (typically as an argument when starting the 
child process). 
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. The child process instantiates an AnonymousPipeClientStrean, specifying the 
opposite PipeDirection. 


. The server releases the local handle that was generated in Step 2, by calling 
DisposeLocaLCopyOfClientHandle. 


. The parent and child processes communicate by reading/writing the stream. 


Because anonymous pipes are unidirectional, a server must create two pipes for 
bidirectional communication. The following Console program creates two pipes 
(input and output) and then starts up a child process. It then sends a single byte to 
the child process and receives a single byte in return: 


class Program 


6 
static void Main (string[] args) 
{ 
if (args.Length == 0) 
// No arguments signals server mode 
AnonymousPipeServer(); 
else 
// We pass in the pipe handle IDs as arguments to signal client mode 
AnonymousPipeClient (args [0], args [1]); 
} 


static void AnonymousPipeClient (string rxID, string txID) 
{ 
using (var rx = new AnonymousPipeClientStream (PipeDirection.In, rxID)) 
using (var tx = new AnonymousPipeClientStream (PipeDirection.Out, txID)) 
{ 
Console.WriteLine ("Client received: " + rx.ReadByte ()); 
tx.WriteByte (200); 
} 
} 


static void AnonymousPipeServer () 
{ 
using var tx = new AnonymousPipeServerStream ( 
PipeDirection.Out, HandleInheritability.Inheritable) ; 
using var rx = new AnonymousPipeServerStream ( 
PipeDirection.In, HandleInheritability. Inheritable) ; 
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string txID = tx.GetClientHandleAsString (); 
string rxID = rx.GetClientHandleAsString (); 


// Create and start up a child process. 

// We'll use the same Console executable, but pass in arguments: 
string thisAssembly = Assembly.GetEntryAssembly().Location; 
string thisExe = Path.ChangeExtension (thisAssembly, ".exe"); 
var args = $"{txID} {rxID}"; 

var startInfo = new ProcessStartInfo (thisExe, args); 


startInfo.UseShellExecute = false; // Required for child process 
Process p = Process.Start (startInfo); 
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tx.DisposeLocalCopyOfClientHandle (); // Release unmanaged 
rx.DisposeLocalCopyOfClientHandle (); // handle resources. 


tx.WriteByte (100); // Send a byte to the child process 
Console.WriteLine ("Server received: " + rx.ReadByte ()); 


p.WaitForExit (); 
i 
I 
As with named pipes, the client and server must coordinate their sending and 
receiving and agree on the length of each transmission. Anonymous pipes don't, 
unfortunately, support message mode, so you must implement your own protocol 
for message length agreement. One solution is to send, in the first four bytes of each 
transmission, an integer value defining the length of the message to follow. The 
BitConverter class provides methods for converting between an integer and an 
array of four bytes. 


BufferedStream 


BufferedStream decorates, or wraps, another stream with buffering capability, and 
it is one of a number of decorator stream types in the.NET Core, all of which are 
illustrated in Figure 15-4. 
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Figure 15-4. Decorator streams 
Buffering improves performance by reducing round trips to the backing store. 
Here's how we wrap a FileStream in a 20 KB BufferedStream: 


// Write 100K to a file: 
File.WriteALlBytes ("myFile.bin", new byte [100000]); 


using FileStream fs = File.OpenRead ("myFile.bin"); 
using BufferedStream bs = new BufferedStream (fs, 20000); //20K buffer 


bs.ReadByte(); 
Console.WriteLine (fs.Position); // 20000 
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In this example, the underlying stream advances 20,000 bytes after reading just 1 
byte, thanks to the read-ahead buffering. We could call ReadByte another 19,999 
times before the FileStream would be hit again. 


Coupling a BufferedStream to a FileStrean, as in this example, is of limited value 
because FileStream already has built-in buffering. Its only use might be in enlarg- 
ing the buffer on an already constructed FileStream. 


Closing a Buf feredStream automatically closes the underlying backing store stream. 


Stream Adapters 


A Stream deals only in bytes; to read or write data types such as strings, integers, or 
XML elements, you must plug in an adapter. Here’s what .NET Core provides: 


Text adapters (for string and character data) 
TextReader, TextWriter 
StreamReader, StreamWriter 
StringReader, StringWriter 


Binary adapters (for primitive types such as int, bool, string, and float) 
BinaryReader, BinaryWriter 


XML adapters (covered in Chapter 11) 
XmlReader, XmlWriter 


Figure 15-5 illustrates the relationships between these types. 
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Figure 15-5. Readers and writers 
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Text Adapters 


TextReader and TextWriter are the abstract base classes for adapters that deal 
exclusively with characters and strings. Each has two general-purpose implementa- 
tions in .NET Core: 


StreamReader/StreamWriter 
Uses a Stream for its raw data store, translating the stream’s bytes into charac- 
ters or strings 


StringReader/StringWriter 
Implements TextReader/TextWriter using in-memory strings 


Table 15-2 lists TextReader’s members by category. Peek returns the next character 
in the stream without advancing the position. Both Peek and the zero-argument 
version of Read return —1 if at the end of the stream; otherwise, they return an inte- 
ger that can be cast directly to a char. The overload of Read that accepts a char[] 
buffer is identical in functionality to the ReadBlock method. ReadLine reads until 
reaching either a CR (character 13) or LF (character 10), or a CR+LF pair in 
sequence. It then returns a string, discarding the CR/LF characters. 


Table 15-2. TextReader members 


Category Members 


Reading one public virtual int Peek(); // Cast the result to a char 
char 


public virtual int Read(); // Cast the result to a char 


Reading many public virtual int Read (char[] buffer, int index, int 
chars count); 


public virtual int ReadBlock (char[] buffer, int index, int 
count); 


public virtual string ReadLine(); 

public virtual string ReadToEnd(); 
Closing public virtual void Close(); 

public void Dispose(); // Same as Close 
Other public static readonly TextReader Null; 


public static TextReader Synchronized (TextReader reader); 
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Environment.NewLine returns the newline sequence for the 
current OS. 


On Windows, this is "\r\n" (think “ReturN”) and is loosely 
modeled on a mechanical typewriter: a CR (character 13) fol- 
lowed by an LF (character 10). Reverse the order and you'll 
get either two new lines or none! 


On Unix and macOS, it’s simply "\n". 


TextWriter has analogous methods for writing, as shown in Table 15-3. The Write 
and WriteLine methods are additionally overloaded to accept every primitive type, 
plus the object type. These methods simply call the ToString method on whatever 
is passed in (optionally through an IFormatProvider specified either when calling 
the method or when constructing the TextWriter). 


Table 15-3. Textwriter members 


Category Members 


Writing one char 


Writing many chars 


Closing and 
flushing 


Formatting and 
encoding 


Other 


public virtual void Write (char value); 
public virtual void Write (string value); 


public virtual void Write (char[] buffer, int index, int 
count); 


public virtual void Write (string format, params object[] 
arg); 


public virtual void WriteLine (string value); 


public virtual void Close(); 


public void Dispose(); // Same as Close 
public virtual void Flush(); 


public virtual IFormatProvider FormatProvider { get; } 
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public virtual string NewLine { get; set; } 
public abstract Encoding Encoding { get; } 
public static readonly TextWriter Null; 


public static TextWriter Synchronized (TextWriter writer); 





WriteLine simply appends the given text with Environment.NewLine. You can 
change this via the NewLine property (this can be useful for interoperability with 
Unix file formats). 


As with Stream, TextReader and TextWriter offer task-based 
asynchronous versions of their read/write methods. 
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StreamReader and StreamWriter 


In the following example, a StreamWriter writes two lines of text to a file and then a 
StreamReader reads the file back: 


using (FileStream fs = File.Create ("test.txt")) 
using (TextWriter writer = new StreamWriter (fs)) 


{ 
writer.WriteLine ("Line1"); 
writer.WriteLine ("Line2"); 


} 


using (FileStream fs = File.OpenRead ("test.txt")) 
using (TextReader reader = new StreamReader (fs)) 


{ 
Console.WriteLine (reader .ReadLine()); // Linet 
Console.WriteLine (reader .ReadLine()); // Line2 


} 


Because text adapters are so often coupled with files, the File class provides the 
static methods CreateText, AppendText, and OpenText to shortcut the process: 


using (TextWriter writer = File.CreateText ("test.txt")) 


{ 
writer.WriteLine ("Line1"); 
writer.WriteLine ("Line2"); 


} 


using (TextWriter writer = File.AppendText ("test.txt")) 
writer.WriteLine ("Line3"); 


using (TextReader reader = File.OpenText ("test.txt")) 
while (reader.Peek() > -1) 
Console.WriteLine (reader .ReadLine()); // Linet 
// Line2 
// Line3 


This also illustrates how to test for the end of a file (viz. reader .Peek()). Another 
option is to read until reader .ReadLine returns null. 


You can also read and write other types such as integers, but because TextWriter 
invokes ToString on your type, you must parse a string when reading it back: 


using (TextWriter w = File.CreateText ("data.txt")) 


{ 
w.WriteLine (123); // Writes "123" 
w.WriteLine (true); // Writes the word "true" 
} 
using (TextReader r = File.OpenText ("data.txt")) 
{ 
int myInt = int.Parse (r.ReadLine()); // myInt == 123 
bool yes = bool.Parse (r.ReadLine()); // yes == true 
Ys 
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Character encodings 


TextReader and TextWriter are by themselves just abstract classes with no connec- 
tion to a stream or backing store. The StreamReader and StreamWriter types, how- 
ever, are connected to an underlying byte-oriented stream, so they must convert 
between characters and bytes. They do so through an Encoding class from the 
System.Text namespace, which you choose when constructing the StreamReader 
or StreamWriter. If you choose none, the default UTF-8 encoding is used. 


If you explicitly specify an encoding, StreamWriter will, by 
default, write a prefix to the start of the stream to identity the 
encoding. This is usually undesirable and you can prevent it 
by constructing the encoding as follows: 
var encoding = new UTF8Encoding ( 
encoderShouLdEmitUTF8Identifier: false, 
throwOnInvalidBytes: true); 
The second argument tells the StreamWriter (or Stream 
Reader) to throw an exception if it encounters bytes that do 
not have a valid string translation for their encoding, which 
matches its default behavior if you do not specify an encoding. 


The simplest of the encodings is ASCII because each character is represented by one 
byte. The ASCII encoding maps the first 127 characters of the Unicode set into its 
single byte, covering what you see on a US-style keyboard. Most other characters, 
including specialized symbols and non-English characters, cannot be represented 
and are converted to the O character. The default UTF-8 encoding can map all allo- 
cated Unicode characters, but it is more complex. The first 127 characters encode to 
a single byte, for ASCII compatibility; the remaining characters encode to a variable 
number of bytes (most commonly two or three). Consider the following: 














using (TextWriter w = File.CreateText ("but.txt")) // Use default UTF-8 
w.WriteLine ("but-"); // encoding. 


using (Stream s = File.OpenRead ("but.txt")) 
for (int b; (b = s.ReadByte()) > -1;) 
Console.WriteLine (b); 
The word “but” is followed not by a stock-standard hyphen, but by the longer em- 
dash (—) character, U+2014. This is the one that won't get you into trouble with 
your book editor! Let’s examine the output: 


98 // b 

117, J//u 

11460 // tt 

226 // em dash byte 1 Note that the byte values 
128 // em dash byte 2 are >= 128 for each part 
148 // em dash byte 3 of the multibyte sequence. 


13 // <CR> 
10 // <LF> 


Because the em-dash is outside the first 127 characters of the Unicode set, it 
requires more than a single byte to encode in UTF-8 (in this case, three). UTF-8 is 
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efficient with the Western alphabet as most popular characters consume just one 
byte. It also downgrades easily to ASCII simply by ignoring all bytes above 127. Its 
disadvantage is that seeking within a stream is troublesome because a character's 
position does not correspond to its byte position in the stream. An alternative is 
UTF-16 (labeled just Unicode in the Encoding class). Here's how we write the same 
string with UTF-16: 


using (Stream s = File.Create ("but.txt")) 
using (TextWriter w = new StreamWriter (s, Encoding.Unicode) ) 
w.WriteLine ("but-"); 


foreach (byte b in File.ReadAllBytes ("but.txt")) 
Console.WriteLine (b); 


And here’s the output: 


255 // Byte-order mark 1 
254 // Byte-order mark 2 
98 // 'b' byte 1 


0 // 'b' byte 2 
117 // ‘u' byte 1 
0 // ‘u' byte 2 
116 // 't' byte 1 
0 // 't' byte 2 
20 // '--' byte 1 
32 // '--' byte 2 
13 // <CR> byte 1 
0 // <CR> byte 2 
10 // <LF> byte 1 
0 // <LF> byte 2 


Technically, UTF-16 uses either two or four bytes per character (there are close to a 
million Unicode characters allocated or reserved, so two bytes is not always 
enough). However, because the C# char type is itself only 16 bits wide, a UTF-16 
encoding will always use exactly two bytes per .NET char. This makes it easy to 
jump to a particular character index within a stream. 


UTF-16 uses a two-byte prefix to identify whether the byte pairs are written in a 
little-endian or big-endian order (the least significant byte first or the most signifi- 
cant byte first). The default little-endian order is standard for Windows-based 
systems. 


StringReader and StringWriter 


The StringReader and StringWriter adapters don't wrap a stream at all; instead, 
they use a string or StringBuilder as the underlying data source. This means no 
byte translation is required—in fact, the classes do nothing you couldn't easily ach- 
ieve with a string or StringBuilder coupled with an index variable. Their advan- 
tage, though, is that they share a base class with StreamReader/StreamWriter. For 
instance, suppose that we have a string containing XML and want to parse it with an 
XmLReader. The XmLReader .Create method accepts one of the following: 
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e A URI 
e A Stream 
e A TextReader 


So, how do we XML-parse our string? Because StringReader is a subclass of Text 
Reader, we're in luck. We can instantiate and pass in a StringReader as follows: 


XmlReader r = XmlReader.Create (new StringReader (myString)); 


Binary Adapters 


BinaryReader and BinaryWriter read and write native data types: bool, byte, char, 
decimal, float, double, short, int, Long, sbyte, ushort, uint, and ulong, as well 
as strings and arrays of the primitive data types. 


Unlike StreamReader and StreamWriter, binary adapters store primitive data types 
efficiently because they are represented in memory. So, an int uses four bytes; a 
double uses eight bytes. Strings are written through a text encoding (as with Stream 
Reader and StreamWriter) but are length-prefixed in order to make it possible to 
read back a series of strings without needing special delimiters. 


Imagine that we have a simple type, defined as follows: 


public class Person 

{ 
public string Name; 
public int Age; 
public double Height; 

} 


We can add the following methods to Person to save/load its data to/from a stream 
using binary adapters: 


public void SaveData (Stream s) 
{ 
var w = new BinaryWriter (s); 
w.Write (Name); 
w.Write (Age); 
w.Write (Height); 
w.Flush(); // Ensure the BinaryWriter buffer is cleared. 
// We won't dispose/close it, so more data 
} // can be written to the stream. 


public void LoadData (Stream s) 
{ 


var r = new BinaryReader (s); 


Name = r.ReadString(); 

Age = r.ReadInt32(); 

Height = r.ReadDouble(); 
} 
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BinaryReader can also read into byte arrays. The following reads the entire contents 
of a seekable stream: 


byte[] data = new BinaryReader (s).ReadBytes ((int) s.Length); 


This is more convenient than reading directly from a stream because it doesn’t 
require a loop to ensure that all data has been read. 


Closing and Disposing Stream Adapters 


You have four choices in tearing down stream adapters: 


1. Close the adapter only 

2. Close the adapter and then close the stream 

3. (For writers) Flush the adapter and then close the stream 
4 


. (For readers) Close just the stream 


Close and Dispose are synonymous with adapters, just as they 
are with streams. 


Options 1 and 2 are semantically identical because closing an adapter automatically 
closes the underlying stream. Whenever you nest using statements, you're implicitly 
taking option 2: 

using (FileStream fs = File.Create ("test.txt")) 


using (TextWriter writer = new StreamWriter (fs)) 
writer.WriteLine ("Line"); 


Because the nest disposes from the inside out, the adapter is first closed, and then 
the stream. Furthermore, if an exception is thrown within the adapter’s constructor, 
the stream still closes. It’s hard to go wrong with nested using statements! 


Never close a stream before closing or flushing its writer— 
you'll amputate any data that’s buffered in the adapter. 


Options 3 and 4 work because adapters are in the unusual category of optionally 
disposable objects. An example of when you might choose not to dispose an adapter 
is when you've finished with the adapter, but you want to leave the underlying 
stream open for subsequent use: 


using (FileStream fs = new FileStream ("test.txt", FileMode.Create)) 


{ 


StreamWriter writer = new StreamWriter (fs); 
writer.WriteLine ("Hello"); 
writer.Flush(); 


fs.Position = 0; 
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Console.WriteLine (fs.ReadByte()); 
} 


Here, we write to a file, reposition the stream, and then read the first byte before 
closing the stream. If we disposed the StreamWriter, it would also close the under- 
lying FileStream, causing the subsequent read to fail. The proviso is that we call 
Flush to ensure that the StreamWriter’s buffer is written to the underlying stream. 


Stream adapters—with their optional disposal semantics—do 

not implement the extended disposal pattern where the final- 

izer calls Dispose. This allows an abandoned adapter to evade 

automatic disposal when the garbage collector catches up with 

it. 
There's also a constructor on StreamReader/StreamWriter that instructs it to keep 
the stream open after disposal. Consequently, we can rewrite the preceding example 
as follows: 


using (var fs = new FileStream ("test.txt", FileMode.Create) ) 


{ 


using (var writer = new StreamWriter (fs, new UTF8Encoding (false, true), 
0x400, true) ) 
writer.WriteLine ("Hello"); 


fs.Position = 0; 
Console.WriteLine (fs.ReadByte()); 
Console.WriteLine (fs.Length); 


} 


Compression Streams 


Two general-purpose compression streams are provided in the System.I0 

-Compression namespace: DeflateStream and GZipStream. Both use a popular 
compression algorithm similar to that of the ZIP format. They differ in that GZip 
Stream writes an additional protocol at the start and end—including a CRC to 
detect errors. GZipStream also conforms to a standard recognized by other software. 
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.NET Core also includes BrotliStream, which implements the Brotli compression 
algorithm. BrotliStream is more than 10 times slower than DeflateStream and 
GZipStream but achieves a better compression ratio. (The performance hit applies 
only to compression—decompression performs very well.) 


All three streams allow reading and writing, with the following provisos: 


e You always write to the stream when compressing. 


¢ You always read from the stream when decompressing. 


DeflateStream, GZipStream, and BrotliStream are decorators; they compress or 
decompress data from another stream that you supply in construction. In the 
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following example, we compress and decompress a series of bytes using a File 
Stream as the backing store: 


using (Stream s = File.Create ("compressed.bin")) 
using (Stream ds = new DeflateStream (s, CompressionMode.Compress) ) 
for (byte i = 0; i < 100; i++) 
ds.WriteByte (i); 


using (Stream s = File.OpenRead ("compressed.bin")) 
using (Stream ds = new DeflateStream (s, CompressionMode.Decompress) ) 
for (byte i = 0; i < 100; i++) 
Console.WriteLine (ds.ReadByte()); // Writes 0 to 99 


With DeflateStream, the compressed file is 102 bytes: slightly larger than the origi- 
nal (BrotliStream would compress it to 73 bytes). Compression works poorly with 
“dense,” nonrepetitive binary data (and worst of all with encrypted data, which lacks 
regularity by design). It works well with most text files; in the next example, we 
compress and decompress a text stream composed of 1,000 words chosen randomly 
from a small sentence with the Brotli algorithm. This also demonstrates chaining a 
backing store stream, a decorator stream, an adapter (as depicted at the start of the 
chapter in Figure 15-1), and the use of asynchronous methods: 


string[] words = "The quick brown fox jumps over the lazy dog".Split(); 
Random rand = new Random (0); // Give it a seed for consistency 


using (Stream s = File.Create ("compressed.bin")) 
using (Stream ds = new BrotliStream (s, CompressionMode.Compress) ) 
using (TextWriter w = new StreamWriter (ds)) 
for (int i = 0; i < 1000; i++) 
await w.WriteAsync (words [rand.Next (words.Length)] + " "); 


Console.WriteLine (new FileInfo ("compressed.bin").Length) ; // 808 


using (Stream s = File.OpenRead ("compressed.bin")) 
using (Stream ds = new BrotliStream (s, CompressionMode.Decompress) ) 
using (TextReader r = new StreamReader (ds)) 

Console.Write (await r.ReadToEndAsync()); // Output below: 


lazy lazy the fox the quick The brown fox jumps over fox over fox The 
brown brown brown over brown quick fox brown dog dog lazy fox dog brown 
over fox jumps lazy lazy quick The jumps fox jumps The over jumps dog... 


In this case, BrotliStream compresses efficiently to 808 bytes—less than 1 byte per 
word. (For comparison, DeflateStream compresses the same data to 885 bytes.) 


Compressing in Memory 


Sometimes, you need to compress entirely in memory. Here’s how to use a 
MemoryStream for this purpose: 


byte[] data = new byte[1000]; // We can expect a good compression 
// ratio from an empty array! 

var ms = new MemoryStream(); 

using (Stream ds = new DeflateStream (ms, CompressionMode.Compress)) 
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ds.Write (data, 0, data.Length); 


byte[] compressed = ms.ToArray(); 
Console.WriteLine (compressed.Length) ; EE AL 


// Decompress back to the data array: 
ms = new MemoryStream (compressed); 
using (Stream ds = new DeflateStream (ms, CompressionMode.Decompress) ) 
for (int i = 0; i < 1000; i += ds.Read (data, i, 1000 - i)); 
The using statement around the DeflateStream closes it in a textbook fashion, 
flushing any unwritten buffers in the process. This also closes the MemoryStream it 
wraps—meaning we must then call ToArray to extract its data. 


Here’ an alternative that avoids closing the MemoryStream and uses the asynchro- 
nous read and write methods: 


byte[] data = new byte[1000]; 


MemoryStream ms = new MemoryStream(); 
using (Stream ds = new DeflateStream (ms, CompressionMode.Compress, true) ) 
await ds.WriteAsync (data, 0, data.Length); 


Console.WriteLine (ms.Length); // 113 

ms.Position = 0; 

using (Stream ds = new DeflateStream (ms, CompressionMode.Decompress) ) 
for (int i = 0; i < 1000; i += await ds.ReadAsync (data, i, 1000 - i)); 


The additional flag sent to Def lateStream’s constructor instructs it to not follow the 


usual protocol of taking the underlying stream with it in disposal. In other words, 
the MemoryStrean is left open, allowing us to position it back to zero and reread it. 


Unix gzip File Compression 


GZipStream’s compression algorithm is popular on Unix systems as a file compres- 
sion format. Each source file is compressed into a separate target file with a .gz 
extension. 


The following methods do the work of the Unix command-line gzip and gunzip 
utilities: 


async Task GZip (string sourcefile, bool deleteSource = true) 
{ 
var gzipfile = $"{sourcefile}.gz"; 
if (File.Exists (gzipfile)) 
throw new Exception ("Gzip file already exists"); 


// Compress 
using (FileStream inStream = File.Open (sourcefile, FileMode.Open)) 


using (FileStream outStream = new FileStream (gzipfile, FileMode.CreateNew) ) 


using (GZipStream gzipStream = 
new GZipStream (outStream, CompressionMode.Compress) ) 
await inStream.CopyToAsync (gzipStream); 








Compression Streams | 663 


n 
oe 
= 
(y) 
Se 
Oz 
3) 
s 
Qa 


if (deleteSource) File.Delete(sourcefile); 


} 
async Task GUnzip (string gzipfile, bool deleteGzip = true) 
{ 

if (Path.GetExtension (gzipfile) != ".gz") 


throw new Exception ("Not a gzip file"); 


var uncompressedFile = gzipfile.Substring (0, gzipfile.Length - 3); 
if (File.Exists (uncompressedFile) ) 
throw new Exception ("Destination file already exists"); 


// Uncompress 
using (FileStream uncompressToStream = 
File.Open (uncompressedFile, FileMode.Create) ) 
using (FileStream zipfileStream = File.Open (gzipfile, FileMode.Open) ) 
using (var unzipStream = 
new GZipStream (zipfileStream, CompressionMode.Decompress) ) 
await unzipStream.CopyToAsync (uncompressToStream) ; 


if (deleteGzip) File.Delete (gzipfile); 
} 


The following compresses a file: 
await GZip ("/tmp/myfile.txt"); // Creates /tmp/myfile.txt.gz 
And the following decompresses it: 


await GUnzip ("/tmp/myfile.txt.gz") // Creates /tmp/myfile.txt 


Working with ZIP Files 


The ZipArchive and ZipFile classes in System.10.Compression support the ZIP 
compression format. The advantage of the ZIP format over DeflateStream and 
GZipStrean is that it acts as a container for multiple files and is compatible with ZIP 
files created with Windows Explorer. 


ZipArchive and ZipFile work in both Windows and Unix; 
however, the format is most popular in Windows. In Unix, 
the .tar format is more popular as a container for multiple 
files. You can read/write .tar files using a third-party library 
such as SharpZipLib. 


ZipArchive works with streams, whereas ZipFile addresses the more common sce- 
nario of working with files. (ZipFile is a static helper class for ZipArchive.) 


ZipFile’s CreateFromDirectory method adds all the files in a specified directory 
into a ZIP file: 


ZipFile.CreateFromDirectory (@"d:\MyFolder", @"d:\archive.zip"); 
ExtractToDirectory does the opposite and extracts a ZIP file to a directory: 


ZipFile.ExtractToDirectory (@"d:\archive.zip", @"d:\MyFolder"); 
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When compressing, you can specify whether to optimize for file size or speed as 
well as whether to include the name of the source directory in the archive. Enabling 
the latter option in our example would create a subdirectory in the archive called 
MyFolder into which the compressed files would go. 


ZipFile has an Open method for reading/writing individual entries. This returns a 
ZipArchive object (which you can also obtain by instantiating ZipArchive with a 
Stream object). When calling Open, you must specify a filename and indicate 
whether you want to Read, Create, or Update the archive. You can then enumerate 
existing entries via the Entries property or find a particular file by calling 
GetEntry: 


using (ZipArchive zip = ZipFile.Open (@"d:\zz.zip", ZipArchiveMode.Read) ) 


foreach (ZipArchiveEntry entry in zip.Entries) 
Console.WriteLine (entry.FullName + " " + entry.Length); 


ZipArchiveEntry also has a Delete method, an ExtractToFile method (this is 
actually an extension method in the ZipFileExtensions class), and an Open method 
that returns a readable/writable Stream. You can create new entries by calling 
CreateEntry (or the CreateEntryFromFile extension method) on the ZipArchive. 
The following creates the archive d:\zz.zip, to which it adds foo.dll, under a directory 
structure within the archive called bin\X86: 


byte[] data = File.ReadAllBytes (@"d:\foo.dlL"); 
using (ZipArchive zip = ZipFile.Open (@"d:\zz.zip", ZipArchiveMode.Update) ) 
zip.CreateEntry (@"bin\X64\foo.dll").Open().Write (data, 0, data.Length); 


You could do the same thing entirely in memory by constructing ZipArchive with a 
MemoryStream. 


File and Directory Operations 


The System.10 namespace provides a set of types for performing “utility” file and 
directory operations, such as copying and moving, creating directories, and setting 
file attributes and permissions. For most features, you can choose between either of 
two classes, one offering static methods and the other instance methods: 


Static classes 
File and Directory 


Instance-method classes (constructed with a file or directory name) 
FileInfo and DirectoryInfo 


Additionally, there's a static class called Path. This does nothing to files or directo- 
ries; instead, it provides string manipulation methods for filenames and directory 
paths. Path also assists with temporary files. 


For UWP applications, also see “File I/O in UWP” on page 676. 
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The File Class 


File is a static class whose methods all accept a filename. The filename can be either 
relative to the current directory or fully qualified with a directory. Here are its meth- 
ods (all public and static): 


bool Exists (string path); // Returns true if the file is present 


void Delete (string path); 
void Copy (string sourceFileName, string destFileName) ; 
void Move (string sourceFileName, string destFileName) ; 
void Replace (string sourceFileName, string destinationFileName, 
string destinationBackupFileName) ; 


FileAttributes GetAttributes (string path); 
void SetAttributes (string path, FileAttributes fileAttributes) ; 


void Decrypt (string path); 
void Encrypt (string path); 


DateTime GetCreationTime (string path); // UTC versions are 
DateTime GetLastAccessTime (string path); // also provided. 
DateTime GetLastWriteTime (string path); 


void SetCreationTime (string path, DateTime creationTime) ; 
void SetLastAccessTime (string path, DateTime LastAccessTime); 
void SetLastWriteTime (string path, DateTime LastWriteTime) ; 


FileSecurity GetAccessControl (string path); 
FileSecurity GetAccessControl (string path, 

AccessControlSections includeSections); 
void SetAccessControl (string path, FileSecurity fileSecurity); 


Move throws an exception if the destination file already exists; Replace does not. 
Both methods allow the file to be renamed as well as moved to another directory. 


Delete throws an UnauthorizedAccessException if the file is marked read-only; 
you can tell this in advance by calling GetAttributes. It also throws that exception 
if the OS denies delete permission for that file to your process. Here are all the 
members of the FileAttribute enum that GetAttributes returns: 


Archive, Compressed, Device, Directory, Encrypted, 
Hidden, IntegritySystem, Normal, NoScrubData, NotContentIndexed, 
Offline, ReadOnly, ReparsePoint, SparseFile, System, Temporary 


Members in this enum are combinable. Here’s how to toggle a single file attribute 
without upsetting the rest: 


string filePath = "test.txt"; 


FileAttributes fa = File.GetAttributes (filePath); 

if ((fa & FileAttributes.ReadOnly) != 0) 

{ 
// Use the exclusive-or operator (*%) to toggle the ReadOnly flag 
fa “= FileAttributes.ReadOnly; 
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File.SetAttributes (filePath, fa); 
} 


// Now we can delete the file, for instance: 
File.Delete (filePath); 


FileInfo offers an easier way to change a file’s read-only flag: 


new FileInfo ("test.txt").IsReadOnly = false; 


Compression and encryption attributes 


The Compressed and Encrypted file attributes correspond to the compression and 
encryption checkboxes on a file or directory’s Properties dialog box in Windows 
Explorer. This type of compression and encryption is transparent in that the OS 
does all the work behind the scenes, allowing you to read and write plain data. 


This feature is Windows-only and requires the NuGet package 
System.Management. 


You cannot use SetAttributes to change a file’s Compressed or Encrypted 
attributes—it fails silently if you try! The workaround is simple in the latter case: 
you instead call the Encrypt() and Decrypt() methods in the File class. With 
compression, it’s more complicated; one solution is to use the Windows Manage- 
ment Instrumentation (WMI) API in System.Management. The following method 
compresses a directory, returning 0 if successful (or a WMI error code if not): 


static uint CompressFolder (string folder, bool recursive) 


{ 





string path = "Win32_Directory.Name='" + folder + "'"; = 
using (ManagementObject dir = new ManagementObject (path)) = o 
using (ManagementBaseObject p = dir.GetMethodParameters ("CompressEx")) Oz 
{ » 

p ["Recursive"] = recursive; FA 

using (ManagementBaseObject result = dir.InvokeMethod ("CompressEx", 

p, null)) 
return (uint) result.Properties ["ReturnValue"].Value; 
} 
} 


To uncompress, replace CompressEx with UncompressEx. 


Transparent encryption relies on a key seeded from the logged-in user’s password. 
The system is robust to password changes performed by the authenticated user, but 
ifa password is reset via an administrator, data in encrypted files is unrecoverable. 


Transparent encryption and compression require special file- 
system support. NTFS (used most commonly on hard drives) 
supports these features; CDFS (on CD-ROMs) and FAT (on 
removable media cards) do not. 
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You can determine whether a volume supports compression and encryption with 
Win32 interop: 


using 
using 
using 
using 
using 


class 


{ 


System; 

System. 10; 

System.Text; 
System.ComponentModel; 
System.Runtime.InteropServices; 


SupportsCompresstonEncryption 


const int SupportsCompression = 0x10; 


const int SupportsEncryption = 0x20000; 


[DllImport ("Kernel32.d1ll", SetLastError = true) ] 


extern static bool GetVolumeInformation (string vol, StringBuilder name, 
int nameSize, out uint serialNum, out uint maxNameLen, out uint flags, 


StringBuilder fileSysName, int fileSysNameSize); 


static void Main() 


{ 


uint serialNum, maxNameLen, flags; 
bool ok = GetVolumeInformation (@"C:\", null, 0, out serialNum, 
out maxNameLen, out flags, null, 0); 


if (!ok) 
throw new Win32Exception(); 
bool canCompress = (flags & SupportsCompression) != 0; 
bool canEncrypt = (flags & SupportsEncryption) != 0; 
} 
} 
File security 


The FileSecurity class allow you to query and change the OS permissions 
assigned to users and roles (namespace System. Security.AccessControl). 


This feature is Windows-only and requires the NuGet package 


System. 1I0.FileSystem.AccessControl. 


In this example, we list a file’s existing permissions and then assign Write permis- 
sion to the “Users” group: 


using 
using 
using 
using 


System; 

System. 10; 

System. Security.AccessControl; 
System.Security.Principal; 


void ShowSecurity (FileSecurity sec) 


{ 


AuthorizationRuleCollection rules = sec.GetAccessRules (true, true, 


typeof (NTAccount)); 


foreach (FileSystemAccessRule r in rules.Cast<FileSystemAccessRule>() 
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-OrderBy (rule => rule. IdentityReference.Value) ) 


{ 
// e.g., MyDomain/Joe 
Console.WriteLine ($" {r.IdentityReference.Value}"); 
// Allow or Deny: e.g., FullControl 
Console.WriteLine ($" {r.FileSystemRights}: {r.AccessControlType}"); 


i 
} 


var file = "sectest.txt"; 
File.WriteAllText (file, "File security test."); 


var sid = new SecurityIdentifier (WellKnownSidType.BuiltinUsersSid, null); 
string usersAccount = sid.Translate (typeof (NTAccount)).ToString(); 


Console.WriteLine ($"User: {usersAccount}"); 


FileSecurity sec = new FileSecurity (file, 
AccessControlSections.Owner | 
AccessControlSections.Group | 
AccessControlSections.Access); 


Console.WriteLine ("AFTER CREATE:"); 
ShowSecurity(sec); // BUILTIN\Users doesn't have Write permission 


sec.ModifyAccessRule (AccessControlModification.Add, 
new FileSystemAccessRule (usersAccount, FileSystemRights.Write, 
AccessControlType.ALlow) , 
out bool modified); 


Console.WriteLine ("AFTER MODIFY:"); 
ShowSecurity (sec); // BUILTIN\Users has Write permission 


We give another example, later, in “Special Folders” on page 673. 


The Directory Class 


The static Directory class provides a set of methods analogous to those in the File 
class—for checking whether a directory exists (Exists), moving a directory (Move), 
deleting a directory (Delete), getting/setting times of creation or last access, and 
getting/setting security permissions. Furthermore, Directory exposes the following 
static methods: 
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string GetCurrentDirectory (); 
void SetCurrentDirectory (string path); 


DirectoryInfo CreateDirectory (string path); 
DirectoryInfo GetParent (string path); 
string GetDirectoryRoot (string path); 


string[] GetLogicalDrives(); // Gets mount points on Unix 


// The following methods all return full paths: 
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string[] GetFiles (string path); 
string[] GetDirectories (string path); 
string[] GetFileSystemEntries (string path); 


TEnumerable<string> EnumerateFiles (string path); 
TEnumerable<string> EnumerateDirectories (string path); 
TEnumerable<string> EnumerateFileSystemEntries (string path); 


The last three methods are potentially more efficient than the 
Get* variants because they're lazily evaluated—fetching data 
from the file system as you enumerate the sequence. They're 
particularly well suited to LINQ queries. 


The Enumerate* and Get* methods are overloaded to also accept searchPattern 
(string) and searchOption (enum) parameters. If you specify SearchOption 
.SearchAllSubDirectories, a recursive subdirectory search is performed. The 
*FileSystemEntries methods combine the results of *Files with *Directories. 


Here's how to create a directory if it doesn’t already exist: 


if (!Directory.Exists (@"d:\test")) 
Directory.CreateDirectory (@"d:\test"); 


Filelnfo and DirectoryInfo 


The static methods on File and Directory are convenient for executing a single file 
or directory operation. If you need to call a series of methods in a row, the FileInfo 
and DirectoryInfo classes provide an object model that makes the job easier. 


FileInfo offers most of the File’s static methods in instance form—with some 
additional properties such as Extension, Length, IsReadOnly, and Directory—for 
returning a DirectoryInfo object: 


static string TestDirectory => 
RuntimeInformation.IsOSPlatform (OSPlatform.Windows) 
? @"C:\Temp" 
: "/tmp"; 
Directory.CreateDirectory (TestDirectory); 
FileInfo fi = new FileInfo (Path.Combine (TestDirectory, "FileInfo.txt")); 


Console.WriteLine (fi.Exists); // false 


using (TextWriter w = fi.CreateText()) 
w.Write ("Some text"); 


Console.WriteLine (fi.Exists); // false (still) 

fi.Refresh(); 

Console.WriteLine (fi.Exists); // true 

Console.WriteLine (fi.Name); // FileInfo.txt 
Console.WriteLine (fi.FullName) ; // c:\temp\FileInfo.txt (Windows) 


// /tmp/FileInfo.txt (Unix) 
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Console.WriteLine (fi.DirectoryName); // c:\temp (Windows) 
// /tmp (Unix) 
Console.WriteLine (fi.Directory.Name); // temp 
Console.WriteLine (fi.Extension); LP atx 
Console.WriteLine (fi.Length); // 9 


fi.Encrypt(); 
fi.Attributes *= FileAttributes.Hidden; // (Toggle hidden flag) 
fi.IsReadOnly = true; 


Console.WriteLine (fi.Attributes); // ReadOnly ,Archive,Hidden, Encrypted 
Console.WriteLine (fi.CreationTime); // 3/09/2019 1:24:05 PM 


fi.MoveTo (Path.Combine (TestDirectory, "FileInfox.txt")); 


DirectoryInfo di = fi.Directory; 

Console.WriteLine (di.Name); // temp or tmp 
Console.WriteLine (di.FullName) ; // c:\temp or /tmp 
Console.WriteLine (di.Parent.FullName); // c:\ or / 
di.CreateSubdirectory ("SubFolder"); 


Here's how to use DirectoryInfo to enumerate files and subdirectories: 


DirectoryInfo di = new DirectoryInfo (@"e:\photos"); 


foreach (FileInfo fi in di.GetFiles ("*.jpg")) 
Console.WriteLine (fi.Name); 


foreach (DirectoryInfo subDir in di.GetDirectories()) 
Console.WriteLine (subDir.FullName); 


Path 


The static Path class defines methods and fields for working with paths and 
filenames. 


Assuming this setup code: 
string dir = @"c:\mydir"; // or /mydir 
string file = "myfile.txt"; 
string path = @"c:\mydir\myfile.txt"; // or /mydir/myfile.txt 


Directory.SetCurrentDirectory (@"k:\demo"); // or /demo 


we can demonstrate Path’s methods and fields with the following expressions: 


Expression Result (Windows, then Unix) 
Directory.GetCurrentDirectory() k:\demo\ or /demo 
Path. IsPathRooted (file) False 

Path. IsPathRooted (path) True 
Path.GetPathRoot (path) c:\or/ 
Path.GetDirectoryName (path) c:\mydir or /mydir 
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Expression 
Path.GetFileName (path) 
Path.GetFullPath (file) 


Path.Combine (dir, file) 


File extensions: 
Path.HasExtension (file) 
Path.GetExtension (file) 


Path. GetFileNameWithoutExtension 
(file) 


Path.ChangeExtension (file, ".log") 
Separators and characters: 
Path.DirectorySeparatorChar 
Path.ALtDirectorySeparatorChar 
Path. PathSeparator 

Path. VolumeSeparatorChar 
Path.GetInvalidPathChars() 
Path.GetInvalidFileNameChars() 
Temporary files: 

Path. GetTempPath() 

Path. GetRandomFileName() 

Path. GetTempFileName( ) 


Result (Windows, then Unix) 
myfile.txt 


k:\demo\myfile. txt or /demo/ 
myfile.txt 


c:\mydir\myfile. txt or /mydir/ 
myfile.txt 


True 
.txt 


myfile 


myfile.log 


\ or / 

/ 

3 Or: 

:or/ 

chars 0 to 31 and "<>| or 0 

chars 0 to 31 and "<>| :*?\/ orOand / 


<local user folder>\Temp or /tmp/ 
d2dwuzjf.dnp 


<local user folder>\Temp\tmp14B. tmp or /tmp/ 
tmpubSUYO. tmp 





Combine is particularly useful: it allows you to combine a directory and filename— 
or two directories—without first having to check whether a trailing path separator is 
present, and it automatically uses the correct path separator for the OS. It provides 
overloads that accept up to four directory and/or filenames. 


GetFullPath converts a path relative to the current directory to an absolute path. It 


accepts values such as ..\..\file.txt. 


GetRandomFileName returns a genuinely unique 8.3 character filename, without 
actually creating any file. GetTempFileName generates a temporary filename using 
an autoincrementing counter that repeats every 65,000 files. It then creates a zero- 
byte file of this name in the local temporary directory. 





672 | Chapter 15: Streams and 1/0 


You must delete the file generated by GetTempFileName when 
youre done; otherwise, it will eventually throw an exception 
(after your 65,000th call to GetTempFileName). If this is a 
problem, you can instead Combine GetTempPath with Get 
RandomFileName. Just be careful not to fill up the user’s hard 
drive! 


Special Folders 


One thing missing from Path and Directory is a means to locate folders such as My 
Documents, Program Files, Application Data, and so on. This is provided instead by 
the GetFolderPath method in the System. Environment class: 


string myDocPath = Environment.GetFolderPath 
(Environment. SpecialFolder .MyDocuments) ; 


Environment.SpecialFolder is an enum whose values encompass all special direc- 
tories in Windows, such as AdminToolLs, ApplicationData, Fonts, History, SendTo, 
StartMenu, and so on. Everything is covered here except the .NET Core directory, 
which you can obtain as follows: 


System.Runtime. InteropServices.RuntimeEnvironment.GetRuntimeDirectory() 


Most of the special folders have no path assigned on Unix sys- 
tems. The following have paths on Ubuntu Linux 18.04 
Desktop: ApplicationData, CommonApplicationData, 
Desktop, DesktopDirectory, LocalApplicationData, My 
Documents, MyMusic, MyPictures, MyVideos, Templates, and 
UserProfile. 


Of particular value on Windows systems is ApplicationData: this is where you can 
store settings that travel with a user across a network (if roaming profiles are 
enabled on the network domain) and LocalApplicationData, which is for non- 
roaming data (specific to the logged-in user) and CommonApplicationData, which is 
shared by every user of the computer. Writing application data to these folders is 
considered preferable to using the Windows Registry. The standard protocol for 
storing data in these folders is to create a subdirectory with the name of your 
application: 
string localAppDataPath = Path.Combine ( 


Environment.GetFolderPath (Environment.SpecialFolder.ApplicationData) , 
"MyCoolAppLlication") ; 


if (!Directory.Exists (localAppDataPath) ) 
Directory.CreateDirectory (localAppDataPath) ; 


There’s a horrible trap when using CommonApplicationData: if a user starts your 
program with administrative elevation and your program then creates folders and 
files in CommonApplicationData, that user might lack permissions to replace those 
files later, when run under a restricted Windows login. (A similar problem exists 
when switching between restricted-permission accounts.) You can work around it 
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by creating the desired folder (with permissions assigned to everyone) as part of 
your setup. 


Another place to write configuration and log files is to the application’s base direc- 
tory, which you can obtain with AppDomain.CurrentDomain.BaseDirectory. This is 
not recommended, however, because the OS is likely to deny your application per- 
missions to write to this folder after initial installation (without administrative 
elevation). 


Querying Volume Information 


You can query the drives on a computer with the DriveInfo class: 


DriveInfo c = new DrivelInfo ("CC"); // Query the C: drive 
// On Unix: / 
long totalSize = c.TotalSize; // Size in bytes 
long freeBytes = c.TotalFreeSpace; // Ignores disk quotas 
long freeToMe = c.AvailableFreeSpace; // Takes quotas into account 


foreach (DriveInfo d in DriveInfo.GetDrives()) // All defined drives 
// On Unix: mount points 


{ 
Console.WriteLine (d.Name); // C:\ 
Console.WriteLine (d.DriveType); // Fixed 
Console.WriteLine (d.RootDirectory); // C:\ 


if (d.IsReady) // If the drive is not ready, the following two 
// properties will throw exceptions: 


{ 
Console.WriteLine (d.VolumeLabel1); // The Sea Drive 
Console.WriteLine (d.DriveFormat); // NTFS 


} 
} 
The static GetDrives method returns all mapped drives, including CD-ROMs, 
media cards, and network connections. DriveType is an enum with the following 
values: 


Unknown, NoRootDirectory, Removable, Fixed, Network, CDRom, Ram 


Catching Filesystem Events 


The FileSystemWatcher class lets you monitor a directory (and optionally, subdir- 
ectories) for activity. FileSystemWatcher has events that fire when files or subdirec- 
tories are created, modified, renamed, and deleted, as well as when their attributes 
change. These events fire regardless of the user or process performing the change. 
Here’s an example: 


static void Main() => Watch (TestDirectory, "*.txt", true); 


static void Watch (string path, string filter, bool includeSubDirs) 


{ 
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using (var watcher = new FileSystemWatcher (path, filter)) 


{ 


watcher. 
watcher. 
watcher. 
watcher. 
watcher. 


watcher. 
watcher. 


Console. 
Console. 


} 


Created += FileCreatedChangedDeleted; 
Changed += FileCreatedChangedDeleted; 
Deleted += FileCreatedChangedDeleted; 
Renamed += FileRenamed; 

Error += FileError; 


IncludeSubdirectories = includeSubDirs; 
EnableRaisingEvents = true; 


WriteLine ("Listening for events - press <enter> to end"); 
ReadLine(); 


// Disposing the FileSystemWatcher stops further events from firing. 


} 


static void 


FileCreatedChangedDeleted (object o, FileSystemEventArgs e) 


=> Console.WriteLine ("File {0} has been {1}", e.FullPath, e.ChangeType) ; 


static void 


FileRenamed (object 0, RenamedEventArgs e) 


=> Console.WriteLine ("Renamed: {Q}->{1}", e.OldFullPath, e.FullPath); 


static void 


=> Console.WriteLine ("Error: 


FileError (object o, ErrorEventArgs e) 
"+ e.GetException().Message); 


static string TestDirectory => 
RuntimeInformation.IsOSPlatform (OSPlatform.Windows) 
? @"C:\Temp" 


: "/tmp' 


The Error event does not inform you of filesystem errors; instead, it indicates that 
the FileSystemWatcher’s event buffer overflowed because it was overwhelmed by 


a 
2 


Because FileSystemWatcher raises events on a_ separate 
thread, you must exception-handle the event handling code to 
prevent an error from taking down the application. For more 
information, see “Exception Handling” on page 584. 
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Changed, Created, Deleted, or Renamed events. You can change the buffer size via 
the InternalBufferSize property. 


IncludeSubdirectories applies recursively. So, if you create a FileSystemWatcher 
on C:\ with IncludeSubdirectories true, its events will fire when a file or direc- 
tory changes anywhere on the hard drive. 


A trap in using FileSystemWatcher is to open and read newly 
created or updated files before the file has been fully populated 
or updated. If you're working in conjunction with some other 
software that’s creating files, you might need to consider some 
strategy to mitigate this, such as creating files with an 
unwatched extension and then renaming them after they’re 
fully written. 
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File 1/0 in UWP 


UWP applications are restricted in terms of the directories and files that they can 
access. The easiest way to navigate the restrictions is to use the WinRT types in the 
Windows.Storage namespace, the two primary classes being StorageFolder and 
StorageFile. 


In Windows Runtime for Windows 8 and 8.1, you couldn't use 
FileStream or the Directory/File classes at all. This made it 
more difficult to write portable class libraries, so this restric- 
tion has been relaxed in UWP for Windows 10, although the 
limits on what directories and files you can access still apply. 


Working with Directories 


The StorageFolder class represents a directory. You can obtain a StorageFolder 
via its static method GetFolderFromPathAsync, giving it a full path to the folder. 
However, given that UWP lets you access files only in certain locations, an easier 
approach is to obtain a StorageFolder via a helper property such as Application 
Data.Current.TemporaryFolder, which returns a temporary folder that’s isolated 
to your application. 


We describe all of the approaches for obtaining directories 
and files to which your application has access in “Obtaining 
Directories and Files” on page 677. 


StorageFolder has the properties youd expect (Name, Path, DateCreated, 
Date Modified, Attributes, and so on), methods to delete/rename the folder 
(DeleteAsync/RenameAsync), and methods to list files and subfolders (GetFilesA 
sync and GetFoldersAsync). 


As is evident from their names, the methods are asynchronous, returning an object 
that you can convert into a task with the AsTask extension method, or directly 
await. The following obtains a directory listing of all files in the application’s tempo- 
rary folder: 


StorageFolder tempFolder = ApplicationData.Current.TemporaryFolder ; 
TReadOnlyList<StorageFile> files = await tempFolder.GetFilesAsync(); 
foreach (IStorageFile file in files) 

Debug.WriteLine (file.Name); 


The CreateFileQueryWithOptions method lets you filter to a specific extension: 


StorageFolder tempFolder = ApplicationData.Current.TemporaryFolder ; 
var queryOptions = new QueryOptions (CommonFileQuery.DefaultQuery, 
new[] { ".txt" }); 
var txtFiles = await tempFolder.CreateFileQueryWithOptions (queryOptions) 
.GetFilesAsync(); 
foreach (StorageFile file in txtFiles) 
Debug.WriteLine (file.Name); 
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The QueryOptions class exposes properties to further control the search. For exam- 
ple, the FolderDepth property requests a recursive directory listing: 


queryOptions.FolderDepth = FolderDepth.Deep; 


Working with Files 


StorageFile is the primary class for working with files. You can obtain an instance 
from a full path (to which you have permission) with the static StorageFile.Get 
FileFromPathAsync method, or from a relative path by calling GetFileAsync 
method on a StorageFolder (or IStorageFolder) object: 


StorageFolder tempFolder = ApplicationData.Current.TemporaryFolder; 
StorageFile file = await tempFolder.GetFileAsync ("foo.txt"); 


If the file does not exist, a Fi leNotFoundException is thrown at that point. 


StorageFile has properties such as Name, Path, and so on, and methods for work- 
ing with files, such as Move, Rename, Copy, and Delete (all Async). The CopyAsync 
method returns a StorageFile corresponding to the new file. There's also a 
CopyAndReplaceAsync, which accepts a target StorageFile object rather than a tar- 
get name and folder. 


StorageFile also exposes methods to open the file for reading/writing via NET 
streams (OpenStreamForReadAsync and OpenStreamForWriteAsync). For example, 
the following creates and writes to a file called test.txt in the temporary folder: 


StorageFolder tempFolder = ApplicationData.Current.TemporaryFolder ; 


StorageFile file = await tempFolder.CreateFileAsync 
("test.txt", CreationCollisionOption.ReplaceExisting) ; 


using (Stream stream = await file.OpenStreamForWriteAsync()) 
using (StreamWriter writer = new StreamWriter (stream) ) 
await writer.WriteLineAsync ("This is a test"); 


If you don't specify CreationCollisionOption.ReplaceExist 
ing and the file already exists, it will automatically append a 
number to the filename to make it unique. 


The following reads back the file: 


StorageFolder tempFolder = ApplicationData.Current.TemporaryFolder ; 
StorageFile file = await tempFolder.GetFileAsync ("test.txt"); 


using (var stream = await file.OpenStreamForReadAsync ()) 
using (StreamReader reader = new StreamReader (stream) ) 
Debug.WriteLine (await reader .ReadToEndAsync()); 


Obtaining Directories and Files 


In this section, we describe all of the locations to which UWP apps can potentially 
read and write files, and how to obtain them. 
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Isolated storage 


The following ApplicationData folders are all private to your app and support both 
reading and writing: 


Windows. Storage.ApplicationData.Current.LocalFolder 
Windows. Storage.ApplicationData.Current.RoamingFolder 
Windows. Storage.ApplicationData.Current.TemporaryFolder 


The following writes, reads, and then deletes a file in LocalFolder: 


StorageFolder localFolder = ApplicationData.Current.LocalFolder; 
var myFile = Path.Combine (localFolder.Path, "full.txt"); 

await File.WriteAllTextAsync (myFile, "My Data"); 

var data = await File.ReadAllTextAsync (myFile); 

File.Delete (myFile); 


Application folder 


A UWP app has read-only access to the folder in which the application is installed. 
There are two ways to access this folder; the first is to use the Package class in the 
Windows .ApplicationModel namespace to obtain a StorageFolder: 


StorageFolder installedLocation = Package.Current.InstalledLocation; 
string txt = await File.ReadAllTextAsync ( 
Path.Combine (installedLocation.Path, "test.txt")); 


The second is to directly obtain a StorageFile with an app URI: 


StorageFile file = await StorageFile. 
GetFileFromApplicationUriAsync (new Uri ("ms-appx:///test.txt")); 


using (var st = await file.OpenStreamForReadAsync() ) 
using (var tr = new StreamReader (st)) 
Console.WriteLine (await tr.ReadToEndAsync()); 


KnownFolders 


The KnownFolders class exposes a static property for each of the following (poten- 
tially) permitted locations: 


public static StorageFolder AppCaptures { get; } 
public static StorageFolder CameraRoll { get; } 
public static StorageFolder DocumentsLibrary { get; } 
public static StorageFolder HomeGroup { get; } 

public static StorageFolder MediaServerDevices { get; } 
public static StorageFolder MusicLibrary { get; } 
public static StorageFolder Objects3D { get; } 

public static StorageFolder PicturesLibrary { get; } 
public static StorageFolder Playlists { get; } 

public static StorageFolder RecordedCalls { get; } 
public static StorageFolder RemovableDevices { get; } 
public static StorageFolder SavedPictures { get; } 
public static StorageFolder VideosLibrary { get; } 
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If you want to access any of these locations, you must declare them in the applica- 
tion’s package manifest (in Visual Studio 2019, you can directly edit the manifest; in 
Solution Explorer, right-click the manifest file and then choose View Code): 


<Capabilities> 
<Capability Name="internetClient" /> 
<uap:Capability Name="documentsLibrary" /> 
</Capabilities> 
In addition, UWP applications can access only those files whose extensions match 
their declared file type associations, which you can specify in Visual Studio 2019's 
manifest editor, on the Declarations tab. 


KnownFolders also has properties for accessing removable devices and home group 
folders. 


Removable devices 


If your app uses the AutoPlay extension, it can access files on connected devices, if 
the file extension is declared in the application manifest. 


Downloads folder 


UWP apps can create files in the Downloads folder and have full access to the files 
created. However, you can do so only through the StorageFile instance; you can- 
not use methods such as File.WriteAllTextAsync or File.Delete: 


StorageFile newFile = await DownloadsFolder.CreateFileAsync 
("MyDownload. txt") ; 


using (var st = await newFile.OpenStreamForWriteAsync()) 
using (var tw = new StreamWriter (st)) 
tw.Write ("My data"); 


using (var st = await newFile.OpenStreamForReadAsync()) 
using (var tr = new StreamReader (st)) 


{ 
var txt = await tr.ReadToEndAsync(); 


} 


await newFile.DeleteAsync(); 


User-selected files and folders 


Your UWP application can also access any file or folder that the user explicitly 
chooses via a FileOpenPicker or FolderPicker dialog (subject to normal OS per- 
missions for the underlying user). 


Using a FileOpenPicker: 


FileOpenPicker openPicker = new FileOpenPicker(); 
openPicker.ViewMode = PickerViewMode. Thumbnail; 
openPicker.SuggestedStartLocation = PickerLocationId.Desktop; 
openPicker.FileTypeFilter.Add (".txt"); 
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StorageFile picked = await openPicker.PickSingleFileAsync(); 
if (picked != null) 
{ 
using (var st = await picked.OpenStreamForReadAsync()) 
using (var sr = new StreamReader (st)) 
{ 
var txt = sr.ReadToEnd(); 
} 
} 


Using a FolderPicker: 


FolderPicker dirPicker = new FolderPicker(); 
dirPicker.ViewMode = PickerViewMode. Thumbnail; 
dirPicker.SuggestedStartLocation = PickerLocationId.Desktop; 
dirPicker.FileTypeFilter.Add (".txt"); 


StorageFolder userFolder = await dirPicker.PickSingleFolderAsync(); 
if (userFolder != null) 
{ 
var userFile = await userFolder.CreateFileAsync ("InUserFolder.txt"); 
using (var st = await userFile.OpenStreamForWriteAsync()) 
using (var sw = new StreamWriter (st)) 
sw.Write ("My data file in user-picked folder."); 


using (var st = await userFile.OpenStreamForReadAsync()) 
using (var sr = new StreamReader (st)) 


{ 
var txt = sr.ReadToEnd(); 
} 
await userFile.DeleteAsync(); 


} 


0S Security 


All applications are subject to OS restrictions, based on the user’s login privileges. 
These restrictions affect file I/O as well as other capabilities, such as access to the 
Windows Registry. 


In Windows and Unix, there are two types of accounts: 


e An administrative/superuser account that imposes no restrictions in accessing 
the local computer 


e A limited permissions account that restricts administrative functions and visi- 
bility of other users’ data 


On Windows, a feature called User Account Control (UAC) means that administra- 
tors receive two tokens or “hats” when logging in: an administrative hat and an 
ordinary user hat. By default, programs run wearing the ordinary user hat—with 
restricted permissions—unless the program requests administrative elevation. The 
user must then approve the request in the dialog box that’s presented. 
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On Unix, users typically login with restricted accounts. That is also true for admin- 
istrators to lessen the probability of inadvertently damaging the system. When a 
user needs to run a command that requires elevated permissions, they precede the 
command with sudo (short for super-user do). 


By default, your application will run with restricted user privileges. This means that 
you must either: 


¢ Write your application such that it can run without administrative privileges. 


e Demand administrative elevation in the application manifest (Windows only), 
or detect the lack of required privileges and alert the user to restart the applica- 
tion as an administrator/super-user. 


The first option is safer and more convenient to the user. Designing your program 
to run without administrative privileges is easy in most cases. 


You can find out whether youre running under an administrative account, as 
follows: 


[DLLImport("libc")] 
public static extern uint getuid(); 


static bool IsRunningAsAdmin() 


{ 
if (RuntimeInformation.IsOSPlatform (OSPlatform.Windows) ) 


{ 


using var identity = WindowsIdentity.GetCurrent(); 
var principal = new WindowsPrincipal (identity); 
return principal.IsInRole (WindowsBuiltInRole.Administrator) ; 


} 
return getuid() == 0; 


I 
With UAC enabled on Windows, this returns true only if the current process has 
administrative elevation. On Linux, it returns true only if the current process is run- 
ning as super-user (e.g., sudo myapp). 
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Running in a Standard User Account 
Here are the key things that you cannot do in a standard user account: 
¢ Write to the following directories: 
— The OS folder (typically \ Windows or /bin, /sbin, ...) and subdirectories 


—The program files folder (\Program Files or /usr/bin, /opt) and 
subdirectories 


— The root of the OS drive (e.g., C:\ or /) 
e Write to the HKEY_LOCAL_MACHINE branch of the Registry (Windows) 
e Read performance monitoring (WMI) data (Windows) 
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Additionally, as an ordinary Windows user (or even as an administrator), you might 
be refused access to files or resources that belong to other users. Windows uses a 
system of access control lists (ACLs) to protect such resources—you can query and 
assert your own rights in the ACLs via types in System. Security.AccessControl. 
ACLs can also be applied to cross-process wait handles, described in Chapter 22. 


If you're refused access to anything as a result of OS security, the CLR detects the 
failure and throws an UnauthorizedAccessException (rather than failing silently). 


In most cases, you can deal with standard user restrictions as follows: 


e Write files to their recommended locations. 


e Avoid using the Registry for information that can be stored in files (aside of the 
HKEY_CURRENT_USER hive, which you will have read/write access to on 
Windows only). 


e Register ActiveX or COM components during setup (Windows only). 


The recommended location for user documents is SpecialFolder .MyDocuments: 


string docsFolder = Environment.GetFolderPath 
(Environment. SpecialFolder .MyDocuments) ; 


string path = Path.Combine (docsFolder, "test.txt"); 


The recommended location for configuration files that a user might need to modify 
outside of your application is SpecialFolder .ApplicationData (current user only) 
or SpecialFolder .CommonApplicationData (all users). You typically create subdir- 
ectories within these folders, based on your organization and product name. 


Administrative Elevation and Virtualization 


With an application manifest, you can request that Windows prompt the user for 
administrative elevation whenever running your program (Linux ignores this 
request): 


<?xml version="1.0" encoding="utf-8"?> 
<assembly manifestVersion="1.0" xmlns="urn:schemas-microsoft-com:asm.v1"> 
<trustInfo xmlns="urn:schemas-microsoft-com:asm.v2"> 
<security> 
<requestedPrivileges> 
<requestedExecutionLevel level="requireAdministrator" /> 
</requestedPrivileges> 
</security> 
</trustInfo> 
</assembly> 


(We describe application manifests in more detail in Chapter 18.) 


If you replace requireAdministrator with asInvoker, it instructs Windows that 
administrative elevation is not required. The effect is almost the same as not having 
an application manifest at all—except that virtualization is disabled. Virtualization is 
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a temporary measure introduced with Windows Vista to help old applications run 
correctly without administrative privileges. The absence of an application manifest 
with a requestedExecutionLevel element activates this backward-compatibility 
feature. 


Virtualization comes into play when an application writes to the Program Files or 
Windows directory, or the HKEY_LOCAL_MACHINE area of the Registry. Instead 
of throwing an exception, changes are redirected to a separate location on the hard 
disk where they can’t affect the original data. This prevents the application from 
interfering with the OS—or other well-behaved applications. 


Memory-Mapped Files 


Memory-mapped files provide two key features: 


¢ Efficient random access to file data 
¢ The ability to share memory between different processes on the same computer 
The types for memory-mapped files reside in the System. 10.MemoryMappedFiles 


namespace. Internally, they work by wrapping the operating system’s API for 
memory-mapped files. 


Memory-Mapped Files and Random File 1/0 
Although an ordinary FileStream allows random file I/O (by setting the stream’s 
Position property), it’s optimized for sequential I/O. As a rough rule of thumb: 
¢ FileStreams are approximately 10 times faster than memory-mapped files for 
sequential I/O. 
e Memory-mapped files are approximately 10 times faster than FileStreams for 


random I/O. 


Changing a FileStream’s Position can cost several microseconds—which adds up 
if done within a loop. A FileStream is also unsuitable for multithreaded access— 
because its position changes as it is read or written. 


Here are the steps to create a memory-mapped file: 


1. Obtain a FileStream as you would ordinarily. 
2. Instantiate a MemoryMappedFile, passing in the file stream. 
3. Call CreateViewAccessor on the memory-mapped file object. 
The last step gives you a MemoryMappedViewAccessor object that provides methods 


for randomly reading and writing simple types, structures, and arrays (more on this 
in “Working with View Accessors” on page 685). 
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The following creates a one million-byte file and then uses the memory-mapped file 
API to read and then write a byte at position 500,000: 


File.WriteALlBytes ("long.bin", new byte [1000000]); 


using MemoryMappedFile mmf = MemoryMappedFile.CreateFromFile ("long.bin"); 
using MemoryMappedViewAccessor accessor = mmf.CreateViewAccessor(); 


accessor.Write (500000, (byte) 77); 
Console.WriteLine (accessor .ReadByte (500000) ); // 77 


You can also specify a map name and capacity when calling CreateFromFile. Speci- 
fying a non-null map name allows the memory block to be shared with other pro- 
cesses (see the following section); specifying a capacity automatically enlarges the 
file to that value. The following creates a 1,000-byte file: 


File.WriteALlBytes ("short.bin", new byte [1]); 
using (var mmf = MemoryMappedFile.CreateFromFile 
("short.bin", FileMode.Create, null, 1000) ) 


Memory-Mapped Files and Shared Memory (Windows) 


Under Windows, you can also use memory-mapped files as a means of sharing 
memory between processes on the same computer. One process creates a shared 
memory block by calling MemoryMappedFile.CreateNew, and then other processes 
subscribe to that same memory block by calling MemoryMappedFile.OpenExisting 
with the same name. Although it’s still referred to as a memory-mapped “file,” it 
resides entirely in memory and has no disk presence. 


The following code creates a 500-byte shared memory-mapped file and writes the 
integer 12345 at position 0: 


using (MemoryMappedFile mmFile = MemoryMappedFile.CreateNew ("Demo", 500)) 
using (MemoryMappedViewAccessor accessor = mmFile.CreateViewAccessor()) 


{ 
accessor.Write (0, 12345); 
Console.ReadLine(); // Keep shared memory alive until user hits Enter. 


} 


The following code opens that memory-mapped file and reads that integer: 


// This can run in a separate executable: 

using (MemoryMappedFile mmFile = MemoryMappedFile.OpenExisting ("Demo")) 

using (MemoryMappedViewAccessor accessor = mmFile.CreateViewAccessor()) 
Console.WriteLine (accessor.ReadInt32 (0)); // 12345 


Cross-Platform Interprocess Shared Memory 


Both Windows and Unix allow multiple processes to memory-map the same file. 
You must exercise care to ensure appropriate file-sharing settings: 


static void Writer() 


{ 


var file = Path.Combine (TestDirectory, "interprocess.bin"); 
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File.WriteAllBytes (file, new byte [100]); 


using FileStream fs = 
new FileStream (file, FileMode.Open, FileAccess.ReadwWrite, 
FileShare.ReadwWrite) ; 


using MemoryMappedFile mmf = MemoryMappedFile 
.CreateFromFile (fs, null, fs.Length, MemoryMappedFileAccess.ReadwWrite, 
HandleInheritability.None, true); 
using MemoryMappedViewAccessor accessor = mmf.CreateViewAccessor(); 


accessor.Write (0, 12345); 
Console.ReadLine(); // Keep shared memory alive until user hits Enter. 


File.Delete (file); 
} 


static void Reader() 
{ 
// This can run in a separate executable: 
var file = Path.Combine (TestDirectory, "interprocess.bin"); 
using FileStream fs = 
new FileStream (file, FileMode.Open, FileAccess.ReadwWrite, 
FileShare.ReadwWrite) ; 
using MemoryMappedFile mmf = MemoryMappedFile 
.CreateFromFile (fs, null, fs.Length, MemoryMappedFileAccess.ReadwWrite, 
HandleInheritability.None, true); 
using MemoryMappedViewAccessor accessor = mmf.CreateViewAccessor(); 


Console.WriteLine (accessor .ReadInt32 (0)); // 12345 


} 
7) 
¢ 
static string TestDirectory => ® 
RuntimeInformation.IsOSPlatform (OSPlatform.Windows) ~~} 
7 i Oa 
2? @'C:\Test . 
: "/tmp"; 2 





Working with View Accessors 


Calling CreateViewAccessor on a MemoryMappedFile gives you a view accessor that 
lets you read/write values at random positions. 


The Read*/Write* methods accept numeric types, bool, and char, as well as arrays 
and structs that contain value-type elements or fields. Reference types—and arrays 
or structs that contain reference types—are prohibited because they cannot map 
into unmanaged memory. So, if you want to write a string, you must encode it into 
an array of bytes: 

byte[] data = Encoding.UTF8.GetBytes ("This is a test"); 


accessor.Write (0, data.Length); 
accessor.WriteArray (4, data, 0, data.Length); 
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Notice that we wrote the length first. This means we know how many bytes to read 
back later: 


byte[] data = new byte [accessor.ReadInt32 (0)]; 
accessor.ReadArray (4, data, 0, data.Length); 
Console.WriteLine (Encoding.UTF8.GetString (data)); // This is a test 


Here's an example of reading/writing a struct: 


struct Data { public int X, Y; } 


var data = new Data { X = 123, Y = 456 }; 
accessor.Write (0, ref data); 

accessor.Read (0, out data); 

Console.WriteLine (data.X + " " + data.Y); // 123 456 


The Read and Write methods are surprisingly slow. You can get much better perfor- 
mance by directly accessing the underlying unmanaged memory via a pointer. Fol- 
lowing on from the previous example: 


unsafe 


{ 
byte* pointer = null; 
try 
{ 
accessor .SafeMemoryMappedViewHandle.AcquirePointer (ref pointer); 
int* intPointer = (int*) pointer; 
Console.WriteLine (*intPointer); // 123 


} 
finally 


{ 
if (pointer != null) 
accessor. SafeMemoryMappedViewHandLe.ReleasePointer(); 
} 
} 


Your project must be configured to allow unsafe code. You can do that by editing 
your .csproj file: 


<PropertyGroup> 
<ALLowUnsafeBlocks>true</ALlLowUnsafeBlocks> 
</PropertyGroup> 


The performance advantage of pointers is even more pronounced when working 
with large structures because they let you work directly with the raw data rather 


than using Read/Write to copy data between managed and unmanaged memory. We 
explore this further in Chapter 25. 
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16 


Networking 








The Framework offers a variety of classes in the System.Net.* namespaces for com- 
municating via standard network protocols, such as HTTP, TCP/IP, and FTP. Here's 
a summary of the key components: 


e A WebClient facade class for simple download/upload operations via HTTP or 
FTP 


e WebRequest and WebResponse classes for low-level control over client-side 
HTTP or FTP operations 


¢ HttpClient for consuming HTTP web APIs and RESTful services 

e HttpListener for writing an HTTP server 

¢ SmtpClient for constructing and sending mail messages via SMTP 

¢ Dns for converting between domain names and addresses 

e TcpClient, UdpClient, TcpListener, and Socket classes for direct access to the 


transport and network layers 


These types are all part of .NET Standard 2.0, which means UWP applications can 
use them. UWP apps can also use the Windows Runtime (WinRT) types for TCP 
and UDP communication in Windows .Networking.Sockets, which we demonstrate 
in the final section in this chapter. These have the advantage of encouraging asyn- 
chronous programming. 


The .NET types in this chapter are in the System.Net.* and System. I0 namespaces. 


Network Architecture 


Figure 16-1 illustrates the NET networking types and the communication layers in 
which they reside. Most types reside in the transport layer or application layer. The 
transport layer defines basic protocols for sending and receiving bytes (TCP and 





687 


UDP); the application layer defines higher-level protocols designed for specific 
applications such as retrieving web pages (HTTP), transferring files (FTP), sending 
mail (SMTP), and converting between domain names and IP addresses (DNS). 





Application Layer 


















Transport TepListener UdpClient 
Layer (facade class) (facadeclass) 

Network and — 

Link Layers IP Address 


Physical MAC Address 











Figure 16-1. Network architecture 


It's usually most convenient to program at the application layer; however, there are a 
couple of reasons why you might want to work directly at the transport layer. One is 
if you need an application protocol not provided in .NET Core, such as POP3 for 
retrieving mail. Another is if you want to invent a custom protocol for a special 
application such as a peer-to-peer client. 


Of the application protocols, HTTP is special in its applicability to general-purpose 
communication. Its basic mode of operation—“give me the web page with this 
URL’—adapts nicely to “get me the result of calling this endpoint with these argu- 


ments.” (In addition to the “get” verb, there is “put,” “post,” and “delete,” allowing for 
REST-based services.) 


HTTP also has a rich set of features that are useful in multitier business applications 
and service-oriented architectures, such as protocols for authentication and encryp- 
tion, message chunking, extensible headers and cookies, and the ability to have 
many server applications share a single port and IP address. For these reasons, 
HTTP is well supported in .NET Core—both directly, as described in this chapter, 
and at a higher level, through such technologies as Web API and ASP.NET Core. 
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.NET Core provides client-side support for FTP, the popular internet protocol for 
sending and receiving files. Server-side support comes in the form of IIS or Unix- 
based server software. 


As the preceding discussion makes clear, networking is a field that is awash in acro- 
nyms. We list the most common in Table 16-1. 


Table 16-1. Network acronyms 


Acronym Expansion NC 
DNS Domain Name Service Converts between domain names (e.g., ebay.com) and IP 
addresses (e.g., 199.54.213.2) 

FTP File Transfer Protocol Internet-based protocol for sending and receiving files 

HTTP Hypertext Transfer Protocol Retrieves web pages and runs web services 

IIS Internet Information Services Microsoft's web server software 

IP Internet Protocol Network-layer protocol below TCP and UDP 

LAN Local Area Network Most LANs use internet-based protocols such as TCP/IP 

POP Post Office Protocol Retrieves internet mail 

REST REpresentational State A popular web service architecture that uses machine-followable 
Transfer links in responses and that can operate over basic HTTP 


SMTP Simple Mail Transfer Protocol Sends internet mail 


TCP Transmission and Control Transport-layer internet protocol on top of which most higher- 
Protocol layer services are built 
UDP Universal Datagram Protocol _—_Transport-layer internet protocol used for low-overhead services 
such as VoIP 
UNC Universal Naming Convention —_\\computer\sharename\filename 
URI Uniform Resource Identifier Ubiquitous resource naming system (e.g., http://www.amazon.com 


or mailto;joe@bloggs.org) 


URL Uniform Resource Locator Technical meaning (fading from use): subset of URI; popular 
meaning: synonym of URI 





Addresses and Ports 


For communication to work, a computer or device requires an address. The internet 
uses two addressing systems: 





IPv4 
Currently the dominant addressing system; IPv4 addresses are 32 bits wide. 
When string-formatted, IPv4 addresses are written as four dot-separated deci- 
mals (e.g., 101.102.103.104). An address can be unique in the world—or unique 
within a particular subnet (such as on a corporate network). 
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IPv6 
The newer 128-bit addressing system. Addresses are string-formatted in hexa- 
decimal with a colon separator (e.g., [3EA0:FFFF:198A:E4A3:4FF2:54f£A:41 BC: 
8D31]). .NET Core requires that you add square brackets around the address. 


The IPAddress class in the System.Net namespace represents an address in either 
protocol. It has a constructor accepting a byte array, and a static Parse method 
accepting a correctly formatted string: 


IPAddress al = new IPAddress (new byte[] { 101, 102, 103, 104 }); 
IPAddress a2 = IPAddress.Parse ("101.102.103.104"); 

Console.WriteLine (a1.Equals (a2)); // True 
Console.WriteLine (a1.AddressFamily) ; // InterNetwork 


IPAddress a3 = IPAddress.Parse 
("[3EAO: FFFF:198A:E4A3:4FF2:54fA:41BC:8D31]"); 
Console.WriteLine (a3.AddressFamily); // InterNetworkV6 


The TCP and UDP protocols break out each IP address into 65,535 ports, allowing a 
computer on a single address to run multiple applications, each on its own port. 
Many applications have standard default port assignments; for instance, HTTP uses 
port 80; SMTP uses port 25. 


The TCP and UDP ports from 49152 to 65535 are officially 
unassigned, so they are good for testing and small-scale 
deployments. 


An IP address and port combination is represented in the .NET Framework by the 
IPEndPoint class: 


IPAddress a = IPAddress.Parse ("101.102.103.104"); 
IPEndPoint ep = new IPEndPoint (a, 222); // Port 222 
Console.WriteLine (ep.ToString()); // 101.102.103.104:222 


Firewalls block ports. In many corporate environments, only a 
few ports are open—typically, port 80 (for unencrypted 
HTTP) and port 443 (for secure HTTP). 


URIs 


A URI is a specially formatted string that describes a resource on the internet or a 
LAN, such as a web page, file, or email address. Examples include http:// 
www.ietf.org, ftp://myisp/doc.txt, and mailto:joe@bloggs.com. The exact formatting is 
defined by the Internet Engineering Task Force (IETF). 


A URI can be broken up into a series of elements—typically, scheme, authority, and 
path. The Uri class in the System namespace performs just this division, exposing a 
property for each element, as illustrated in Figure 16-2. 
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PathAndQuery 


Scheme 
http: / i 9999)//i if 
http): //192.25.12.4 
File: //Aj jpeg 


Figure 16-2. URI properties 














The Uri class is useful when you need to validate the format of 
a URI string or to split a URI into its component parts. Other- 
wise, you can treat a URI simply as a string—most networking 
methods are overloaded to accept either a Uri object or a 
string. 


You can construct a Uri object by passing any of the following strings into its 
constructor: 


e A URI string, such as http://www.ebay.com or _file://janespc/sharedpics/ 
dolphin.jpg 

¢ An absolute path to a file on your hard disk, such as c:\myfiles\data.xlsx or, on 
Unix, /tmp/myfiles/data.xlsx 


e A UNC path to a file on the LAN, such as \\janespc\sharedpics\dolphin.jpg 


File and UNC paths are automatically converted to URIs: the “file:” protocol is 
added, and backslashes are converted to forward slashes. The Uri constructors also 
perform some basic cleanup on your string before creating the Uri, including con- 
verting the scheme and hostname to lowercase and removing default and blank port 
numbers. If you supply a URI string without the scheme, such as “www.test.com’, a 
UriFormatException is thrown. 


Uri has an IsLoopback property, which indicates whether the Uri references the 
local host (IP address 127.0.0.1), and an IsFile property, which indicates whether 
the Uri references a local or UNC (IsUnc) path (IsUnc reports false for a Samba 
share mounted in a Linux filesystem). If IsFile returns true, the LocalPath prop- 
erty returns a version of AbsolutePath that is friendly to the local OS (with slashes 
or backslashes as appropriate to the OS), on which you can call File. Open. 


Instances of Uri have read-only properties. To modify an existing Uri, instantiate a 
UriBuilder object—this has writable properties and can be converted back via its 


Uri property. 


Uri also provides methods for comparing and subtracting paths: 
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Uri info = new Uri ("http://www.domain.com:80/info/"); 
Uri page = new Uri ("http://www.domain.com/info/page.html"); 


Console.WriteLine (info.Host); // www.domain.com 

Console.WriteLine (info.Port); // 80 

Console.WriteLine (page.Port); // 80 (Uri knows the default HTTP port) 
Console.WriteLine (info.IsBaseOf (page)); // True 

Uri relative = info.MakeRelativeUri (page); 

Console.WriteLine (relative. IsAbsoluteUri) ; // False 
Console.WriteLine (relative. ToString()); // page.html 


A relative Uri, such as page.html in this example, will throw an exception if you call 
almost any property or method other than IsAbsoluteUri and ToString(). You 
can directly instantiate a relative Uri as follows: 


Uri u = new Uri ("page.html", UriKind.Relative); 


A trailing slash is significant in a URI and makes a difference 
as to how a server processes a request if a path component is 
present. 


In a traditional web server, for instance, given the URI http:// 
www.albahari.com/nutshell/, you can expect an HTTP web 
server to look in the nutshell subdirectory in the site’s web 
folder and return the default document (usually index.html). 


Without the trailing slash, the web server will instead look for 
a file called nutshell (without an extension) directly in the site's 
root folder—which is usually not what you want. If no such 
file exists, most web servers will assume the user mistyped and 
will return a 301 Permanent Redirect error, suggesting the cli- 
ent retries with the trailing slash. A NET HTTP client, by 
default, will respond transparently to a 301 in the same way as 
a web browser—by retrying with the suggested URI. This 
means that if you omit a trailing slash when it should have 
been included, your request will still work—but will suffer an 
unnecessary extra round trip. 


The Uri class also provides static helper methods such as EscapeUriString(), 
which converts a string to a valid URL by converting all characters with an ASCII 
value greater than 127 to hexadecimal representation. The CheckHostName() and 
CheckSchemeName() methods accept a string and check whether it is syntactically 
valid for the given property (although they do not attempt to determine whether a 
host or URI exists). 


Client-Side Classes 


WebRequest and WebResponse are common base classes for managing both HTTP 
and FTP client-side activity as well as the “file:” protocol. They encapsulate the 
request/response model that these protocols all share: the client makes a request, and 
then awaits a response from a server. 
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WebClient is a convenient facade class that does the work of calling WebRequest and 
WebResponse, saving you some coding. WebClient gives you a choice of dealing in 
strings, byte arrays, files, or streams; WebRequest and WebResponse support just 
streams. Unfortunately, you cannot rely entirely on WebClient, because it doesn’t 
support some features (such as cookies). 


HttpClient is a newer API for working with HTTP and is designed to work well 
with web APIs, REST-based services, and custom authentication schemes. In .NET 
Framework, HttpClient relied on WebRequest and WebResponse, but in .NET Core, 
it handles HTTP itself. 


For simply downloading/uploading a file, string, or byte array, both WebClient and 
HttpClient are suitable. Both have asynchronous methods, although only Web 
Client offers progress reporting. 


WebClient 


Here are the steps in using WebClient: 


1. Instantiate a WebClient object. 
2. Assign the Proxy property. 
3. Assign the Credentials property if authentication is required. 


4. Call a Download XxX or UploadXXX method with the desired URI. 


Its download methods are as follows: 


public void DownloadFile (string address, string fileName); 
public string DownloadString (string address); 
public byte[] DownloadData (string address); 
public Stream OpenRead (string address); 


Each is overloaded to accept a Uri object instead of a string address. The upload 
methods are similar; their return values contain the response (if any) from the 
server: 


public byte[] UploadFile (string address, string fileName); 

public byte[] UploadFile (string address, string method, string fileName); 

public string UploadString(string address, string data); 

public string UploadString(string address, string method, string data); 

public byte[] UploadData (string address, byte[] data); 

public byte[] UploadData (string address, string method, byte[] data); 

public byte[] UploadValues(string address, NameValueCollection data); 

public byte[] UploadValues(string address, string method, 
NameValueCollection data); 

public Stream OpenWrite (string address); 

public Stream OpenWrite (string address, string method); 


You can use the UploadValues methods to post values to an HTTP form, with a 
method argument of "POST". WebClient also has a BaseAddress property; this 
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allows you to specify a string to be prefixed to all addresses, such as http:// 
www.mysite.com/data. 


Here's how to download the code samples page for this book to a file in the current 
folder: 


WebClient wc = new WebClient { Proxy = null }; 
wc.DownloadFile ("http://www.albahari.com/nutshell/code.aspx", "code.htm"); 


WebClient implements IDisposable under duress—by virtue 
of deriving from Component (this allows it to be sited in the 
Visual Studio’s Designer’s component tray). Its Dispose 
method does nothing useful at runtime, however, so you dont 
need to dispose WebClient instances. 


WebClient provides asynchronous versions of its long-running methods (Chap- 
ter 14) that return tasks that you can await: 


await wc.DownloadFileTaskAsync ("http://oreilly.com", "webpage.htm"); 


(The TaskAsync suffix disambiguates these methods from the old EAP-based asyn- 
chronous methods which use the Async suffix.) Unfortunately, the new methods 
don't support the standard Task-Based Asynchronous Pattern (TAP) pattern for can- 
cellation and progress reporting. Instead, for cancellation you must call the Cancel 
Async method on the WebClient object, and for progress reporting, handle the 
DownloadProgressChanged/UploadProgressChanged event. The following down- 
loads a web page with progress reporting, canceling the download if it takes longer 
than five seconds: 


var wc = new WebClient(); 


wc.DownloadProgressChanged += (sender, args) => 
Console.WriteLine (args.ProgressPercentage + "% complete"); 


Task.Delay (5000).ContinueWith (ant => wc.CancelAsync()); 


await wc.DownloadFileTaskAsync ("http://oreilly.com", "webpage.htm"); 


When a request is canceled, a WebException is thrown whose 
Status property is WebExceptionStatus.RequestCanceled. 
(For historical reasons, an OperationCanceledException is 
not thrown.) 


The progress-related events capture and post to the active synchronization context, 
so their handlers can update UI controls without needing Dispatcher .Begin 
Invoke. 


Using the same WebClient object to perform more than one 
operation in sequence should be avoided if youre relying on 
cancellation or progress reporting because it can result in race 
conditions. 





694 | Chapter 16: Networking 


WebRequest and WebResponse 


WebRequest and WebResponse are more complex to use than WebClient but also 
more flexible. Here's how to get started: 


1. Call WebRequest. Create with a URI to instantiate a web request. 
2. Assign the Proxy property. 


3. Assign the Credentials property if authentication is required. 
To upload data: 


4. Call GetRequestStream on the request object, and then write to the stream. Go 
to step 5 if a response is expected. 


To download data: 


5. Call GetResponse on the request object to instantiate a web response. 


6. Call GetResponseStream on the response object and then read the stream (a 
StreamReader can help!). 


The following downloads and displays the code samples web page (a rewrite of the 
preceding example): 


WebRequest req = WebRequest.Create 
("http: //www.albahari.com/nutshell/code. html"); 
req.Proxy = null; 
using (WebResponse res = req.GetResponse()) 
using (Stream rs = res.GetResponseStream()) 
using (FileStream fs = File.Create ("code.html")) 
rs.CopyTo (fs); 


Here's the asynchronous equivalent: 


WebRequest req = WebRequest.Create 
("http: //www.albahari.com/nutshell/code. html"); 
req.Proxy = null; 
using (WebResponse res = await req.GetResponseAsync() ) 
using (Stream rs = res.GetResponseStream()) 
using (FileStream fs = File.Create ("code.html")) 
await rs.CopyToAsync (fs); 





The web response object has a ContentLength property, indi- 
cating the length of the response stream in bytes, as reported 
by the server. This value comes from the response headers and 
might be missing or incorrect. In particular, ifan HTTP server 
chooses the “chunked” mode to break up a large response, the 
ContentLength value is usually -1. The same can apply with 
dynamically generated pages. 
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The static Create method instantiates a subclass of the WebRequest type, such as 
HttpWebRequest or FtpWebRequest. Its choice of subclass depends on the URI’s pre- 
fix and is shown in Table 16-2. 


Table 16-2. URI prefixes and web request types 


Prefix Web request type 


http: or https: +HttpWebRequest 
ftp: FtpWebRequest 
file: FileWebRequest 





Casting a web request object to its concrete type 
(HttpWebRequest or FtpWebRequest) allows you to access its 
protocol-specific features. 


You can also register your own prefixes by calling WebRequest .RegisterPrefix. 
This requires a prefix along with a factory object with a Create method that instan- 
tiates an appropriate web request object. 


The “https:” protocol is for secure (encrypted) HTTP, via Secure Sockets Layer 
(SSL). Both WebClient and WebRequest activate SSL transparently upon seeing this 
prefix (see “SSL” in “Working with HTTP” on page 706 later in this chapter). The 
“file:” protocol simply forwards requests to a FileStream object. Its purpose is in 
meeting a consistent protocol for reading a URI, whether it be a web page, FTP site, 
or file path. 


WebRequest has a Timeout property, in milliseconds. If a timeout occurs, a 
WebException is thrown with a Status property of WebExceptionStatus. Timeout. 
The default timeout is 100 seconds for HTTP and infinite for FTP. 


You cannot recycle a WebRequest object for multiple requests—each instance is 
good for one job only. 


HttpClient 


HttpClient provides another layer on top of HttpWebRequest and HttpWeb 
Response. It was written in response to the growth of HTTP-based web APIs and 
REST services to provide a better experience than WebClient when dealing with 
protocols more elaborate than simply fetching a web page; specifically: 


¢ A single HttpClient instance supports concurrent requests. To get concur- 
rency with WebClient, you need to create a fresh instance per concurrent 
request, which can get awkward when you introduce custom headers, cookies, 
and authentication schemes. 


¢ HttpClient lets you write and plug in custom message handlers. This enables 
mocking in unit tests, and the creation of custom pipelines (for logging, 
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compression, encryption, and so on). Unit-testing code that calls WebClient is 
a pain. 


e HttpClient has a richer and extensible type system for headers and content. 


HttpClient is not a complete replacement for WebClient, 
because it doesn’t directly support progress reporting. Web 
Client also has the advantage of supporting FTP, file:// and 
custom URI schemes. It’s also available in older Framework 
versions. 


For a solution to progress reporting with HttpClient, see 
HttpClient With Progress.ling, or via LINQPad’s interactive 
samples gallery. 


The simplest way to use HttpClient is to instantiate it and then call one of its Get* 
methods, passing in a URI: 


string html = await new HttpClient().GetStringAsync ("http://lingpad.net"); 


(There's also GetByteArrayAsync and GetStreamAsync.) All I/O-bound methods in 
HttpClient are asynchronous (there are no synchronous equivalents). 


Unlike with WebClient, to get the best performance with HttpClient, you must 
reuse same instance (otherwise things such as DNS resolution can be unnecessarily 
repeated and sockets are held open longer than necessary). HttpClient permits 
concurrent operations, so the following is legal and downloads two web pages at 
once: 


var client = new HttpClient(); 

var task1 = client.GetStringAsync ("http://www.lingpad.net"); 
var task2 = client.GetStringAsync ("http://www.albahari.com"); 
Console.WriteLine (await task1); 

Console.WriteLine (await task2); 


HttpClient has a Timeout property and a BaseAddress property, which prefixes a 
URI to every request. HttpClient is somewhat of a thin shell: most of the other 
properties that you might expect to find here are defined in another class called 
HttpClientHandler. To access this class, you instantiate it and then pass the 
instance into HttpClient’s constructor: 


var handler = new HttpClientHandler { UseProxy = false }; 
var client = new HttpClient (handler); 


In this example, we told the handler to disable proxy support. There are also prop- 
erties to control cookies, automatic redirection, authentication, and so on (we 
describe these in the following sections as well as in “Working with HTTP” on page 
706). 
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GetAsync and response messages 


The GetStringAsync, GetByteArrayAsync, and GetStreamAsync methods are con- 
venient shortcuts for calling the more general GetAsync method, which returns a 
response message: 


var client = new HttpClient(); 

// The GetAsync method also accepts a CancellationToken. 
HttpResponseMessage response = await client.GetAsync ("http://..."); 
response. EnsureSuccessStatusCode(); 

string html = await response.Content.ReadAsStringAsync(); 


HttpResponseMessage exposes properties for accessing the headers (see “Working 
with HTTP” on page 706) and the HTTP StatusCode. Unlike with WebClient, an 
unsuccessful status code such as 404 (not found) doesn’t cause an exception to be 
thrown unless you explicitly call EnsureSuccessStatusCode. Communication or 
DNS errors, however, do throw exceptions (see “Exception Handling” on page 704). 


HttpContent has a CopyToAsync method for writing to another stream, which is 
useful in writing the output to a file: 


using (var fileStream = File.Create ("Linqpad.html")) 
await response.Content.CopyToAsync (fileStream) ; 


GetAsync is one of four methods corresponding to HTTP’s four verbs (the others 
are PostAsync, PutAsync, and DeleteAsync). We demonstrate PostAsync later in 
“Uploading Form Data” on page 707. 


SendAsync and request messages 


The four methods just described are all shortcuts for calling SendAsync, the single 
low-level method into which everything else feeds. To use this, you first construct 
an HttpRequestMessage: 


var client = new HttpClient(); 

var request = new HttpRequestMessage (HttpMethod.Get, "http://..."); 
HttpResponseMessage response = await client.SendAsync (request); 
response. EnsureSuccessStatusCode(); 


Instantiating an HttpRequestMessage object means that you can customize proper- 
ties of the request, such as the headers (see “Headers” on page 706) and the content 
itself, allowing you to upload data. 


Uploading data and HttpContent 


After instantiating an HttpRequestMessage object, you can upload content by 
assigning its Content property. The type for this property is an abstract class called 
HttpContent. .NET Core includes the following concrete subclasses for different 
kinds of content (you can also write your own): 





698 | Chapter 16: Networking 


e ByteArrayContent 
e StringContent 
¢ FormUrlEncodedContent (see “Uploading Form Data” on page 707) 


e StreamContent 


For example: 


var client = new HttpClient (new HttpClientHandler { UseProxy = false }); 
var request = new HttpRequestMessage ( 

HttpMethod.Post, "http://www.albahari.com/EchoPost.aspx"); 
request.Content = new StringContent ("This is a test"); 
HttpResponseMessage response = await client.SendAsync (request); 
response. EnsureSuccessStatusCode(); 

Console.WriteLine (await response.Content.ReadAsStringAsync()); 


HttpMessageHandler 


We said previously that most of the properties for customizing requests are defined 
not in HttpClient, but in HttpClientHandler. The latter is actually a subclass of 
the abstract HttpMessageHandler class, defined as follows: 


public abstract class HttpMessageHandler : IDisposable 
{ 


protected internal abstract Task<HttpResponseMessage> SendAsync 
(HttpRequestMessage request, CancellationToken cancellationToken) ; 


public void Dispose(); 
protected virtual void Dispose (bool disposing); 


} 
The SendAsync method is called from HttpClient’s SendAsync method. 


HttpMessageHandler is simple enough to subclass easily and offers an extensibility 
point into HttpClient. 


Unit testing and mocking 


We can subclass HttpMessageHandler to create a mocking handler to assist with unit 
testing: 


class MockHandler : HttpMessageHandler 
{ 


Func <HttpRequestMessage, HttpResponseMessage> _responseGenerator ; 


public MockHandler 
(Func <HttpRequestMessage, HttpResponseMessage> responseGenerator ) 
{ 


_responseGenerator = responseGenerator; 


} 


protected override Task <HttpResponseMessage> SendAsync 
(HttpRequestMessage request, CancellationToken cancellationToken) 
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cancelLationToken. ThrowlfCancellationRequested() ; 
var response = _responseGenerator (request); 
response.RequestMessage = request; 

return Task.FromResult (response); 


J 
i 
Its constructor accepts a function that tells the mocker how to generate a response 
from a request. This is the most versatile approach because the same handler can 
test multiple requests. 


SendAsync is synchronous by virtue of Task.FromResult. We could have main- 
tained asynchrony by having our response generator return a Task<HttpResponse 
Message>, but this is pointless given that we can expect a mocking function to be 
short running. Here’s how to use our mocking handler: 


var mocker = new MockHandler (request => 
new HttpResponseMessage (HttpStatusCode.OK) 


{ 


Content = new StringContent ("You asked for 


}); 


+ request.RequestUri) 


var client = new HttpClient (mocker); 

var response = await client.GetAsync ("http://www.lingpad.net"); 
string result = await response.Content.ReadAsStringAsync(); 
Assert.AreEqual ("You asked for http://www. linqgpad.net/", result); 


(Assert.AreEqual is a method youd expect to find in a unit-testing framework 
such as NUnit.) 


Chaining handlers with DelegatingHandler 


You can create a message handler that calls another (resulting in a chain of han- 
dlers) by subclassing DelegatingHandler. You can use this to implement custom 
authentication, compression, and encryption protocols. The following demonstrates 
a simple logging handler: 


class LoggingHandler : DelegatingHandler 


{ 
public LoggingHandler (HttpMessageHandler nextHandler) 
{ 
InnerHandler = nextHandler; 
} 


protected async override Task <HttpResponseMessage> SendAsync 
(HttpRequestMessage request, CancellationToken cancellationToken) 
4, 
Console.WriteLine ("Requesting: " + request.RequestUri); 
var response = await base.SendAsync (request, cancellationToken) ; 
Console.WriteLine ("Got response: " + response.StatusCode) ; 
return response; 


" 
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Notice that we've maintained asynchrony in overriding SendAsync. Introducing the 
async modifier when overriding a task-returning method is perfectly legal—and 
desirable in this case. 


A better solution than writing to the Console would be to have the constructor 
accept some kind of logging object. Better still would be to accept a couple of 
Action<T> delegates that tell it how to log the request and response objects. 


Proxies 


A proxy server is an intermediary through which HTTP and FTP requests can be 
routed. Organizations sometimes set up a proxy server as the only means by which 
employees can access the internet—primarily because it simplifies security. A proxy 
has an address of its own and can demand authentication so that only selected users 
on the LAN can access the internet. 


You can instruct a WebClient or WebRequest object to route requests through a 
proxy server with a WebProxy object: 


// Create a WebProxy with the proxy's IP address and port. You can 
// optionally set Credentials if the proxy needs a username/password. 


WebProxy p = new WebProxy ("192.178.10.49", 808); 
p.Credentials = new NetworkCredential ("username", "password"); 
// or: 


p.Credentials = new NetworkCredential ("username", "password", "domain"); 


WebClient wc = new WebClient(); 
wc.Proxy = p; 


// Same procedure with a WebRequest object: 
WebRequest req = WebRequest.Create ("..."); 
req.Proxy = p; 


To use a proxy with HttpClient, first create an HttpClientHandler, assign its Proxy 
property, and then feed that into HttpClient’s constructor: 


WebProxy p = new WebProxy ("192.178.10.49", 808); 
p.Credentials = new NetworkCredential ("username", "password", "domain"); 


var handler = new HttpClientHandler { Proxy = p }; 
var client = new HttpClient (handler); 


If you know there’s no proxy, it’s worth setting the Proxy prop- 
erty to null on WebClient and WebRequest objects. Other- 
wise, NET Core might attempt to “autodetect” your proxy 
settings, adding up to 30 seconds to your request. If you're 
wondering why your web requests execute slowly, this is prob- 


ably it! 
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HttpClientHandler also has a UseProxy property that you can assign to false 
instead of nulling out the Proxy property to defeat autodetection. 


If you supply a domain when constructing the NetworkCredential, Windows-based 
authentication protocols are used. To use the currently authenticated Windows 
user, assign the static CredentialCache.DefaultNetworkCredentials value to the 
proxy’s Credentials property. 


As an alternative to repeatedly setting the Proxy, you can set the global default as 
follows: 


WebRequest.DefaultWebProxy = myWebProxy; 
Or, like this: 
WebRequest.DefaultWebProxy = null; 


Whatever you set applies for the life of the application domain (unless some other 
code changes it!). 


Authentication 


You can supply a username and password to an HTTP or FTP site by creating a 
NetworkCredential object and assigning it to the Credentials property of 
WebClient or WebRequest: 


WebClient wc = new WebClient { Proxy = null }; 
wc.BaseAddress = "ftp://ftp.myserver.com"; 


// Authenticate, then upload and download a file to the FTP server. 
// The same approach also works for HTTP and HTTPS. 


string username = "myuser"; 
string password = "mypassword"; 
wc.Credentials = new NetworkCredential (username, password); 


wc.DownloadFile ("guestbook.txt", "guestbook.txt"); 


string data = "Hello from " + Environment.UserName + "!\r\n"; 
File.AppendAllText ("guestbook.txt", data); 


wc.UploadFile ("guestbook.txt", "guestbook.txt"); 
HttpClient exposes the same Credentials property through HttpClientHandler: 


var handler = new HttpClientHandler(); 
handler.Credentials = new NetworkCredential (username, password); 
var client = new HttpClient (handler); 


This works with dialog-based authentication protocols, such as Basic and Digest, 
and is extensible through the AuthenticationManager class. It also supports Win- 
dows NTLM and Kerberos (if you include a domain name when constructing the 
NetworkCredential object). If you want to use the currently authenticated 
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Windows user, you can leave the Credentials property null and instead set 
UseDefaultCredentials true. 


The authentication is ultimately handled by a WebRequest subtype (in this case, 
FtpWebRequest), which automatically negotiates a compatible protocol. In the case 
of HTTP, there can be a choice: if you examine the initial response from a Microsoft 
Exchange server web mail page, for instance, it might contain the following headers: 


HTTP/1.1 401 Unauthorized 

Content-Length: 83 

Content-Type: text/html 

Server: Microsoft-IIS/6.0 

WwWW-Authenticate: Negotiate 

WWW-Authenticate: NTLM 

WWW-Authenticate: Basic realm="exchange.somedomain.com" 
X-Powered-By: ASP.NET 

Date: Sat, 05 Aug 2006 12:37:23 GMT 


The 401 code signals that authorization is required; the “WWW-Authenticate” 
headers indicate what authentication protocols are understood. If you configure a 
WebClient or WebRequest object with the correct username and password, however, 
this message will be hidden from you because the Framework responds automati- 
cally by choosing a compatible authentication protocol, and then resubmitting the 
original request with an extra header. Here’s an example: 


Authorization: Negotiate TLRMTVNTUAAABAAAt5II2gjACDArAAACAWACACgGAAAAQ 
ATmKAAAADOLVDRdPUKSHUg9VUA== 


This mechanism provides transparency, but generates an extra round trip with each 
request. You can avoid the extra round trips on subsequent requests to the same 
URI by setting the PreAuthenticate property to true. This property is defined on 
the WebRequest class (and works only in the case of HttpWebRequest). WebClient 
doesn’t support this feature at all. 


CredentialCache 


You can force a particular authentication protocol with a CredentialCache object. 
A credential cache contains one or more NetworkCredential objects, each keyed to 
a particular protocol and URI prefix. For example, you might want to avoid the 
Basic protocol when logging into an Exchange Server because it transmits pass- 
words in plain text: 


CredentialCache cache = new CredentialCache(); 

Uri prefix = new Uri ("http://exchange.somedomain.com") ; 

cache.Add (prefix, "Digest", new NetworkCredential ("joe", "passwd")); 
cache.Add (prefix, "Negotiate", new NetworkCredential ("joe", "passwd")); 


WebClient wc = new WebClient(); 
wc.Credentials = cache; 
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An authentication protocol is specified as a string. The valid values are as follows: 
Basic, Digest, NTLM, Kerberos, Negotiate 


In this particular example, WebClient will choose Negotiate, because the server 
didn’t indicate that it supported Digest in its authentication headers. Negotiate is a 
Windows protocol that currently boils down to either Kerberos or NTLM, depend- 
ing on the capabilities of the server, but ensures forward compatibility of your appli- 
cation when future security standards are deployed. 


The static CredentialCache.DefaultNetworkCredentials property allows you to 
add the currently authenticated Windows user to the credential cache without hav- 
ing to specify a password: 


cache.Add (prefix, "Negotiate", CredentialCache.DefaultNetworkCredentials); 


Authenticating via headers with HttpClient 


If you're using HttpClient, another way to authenticate is to set the authentication 
header directly: 


var client = new HttpClient(); 
client.DefaultRequestHeaders.Authorization = 
new AuthenticationHeaderValue ("Basic", 
Convert. ToBase64String (Encoding.UTF8.GetBytes ("username:password"))); 


This strategy also works with custom authentication systems such as OAuth. We 
discuss headers in more detail soon. 


Exception Handling 


WebRequest, WebResponse, WebClient, and their streams all throw a WebException 
in the case of a network or protocol error. HttpClient does the same but then wraps 
the WebException in an HttpRequestException. You can determine the specific 
error via the WebException’s Status property; this returns a WebExceptionStatus 
enum that has the following members: 


CacheEntryNotFound ReceiveFailure 
ConnectFailure RequestCanceled 
ConnectionClosed RequestProhibitedByCachePoLlicy 
KeepAliveFailure RequestProhibitedByProxy 
MessageLengthLimitExceeded SecureChannelFailure 
NameResoLutionFailure SendFailure 
Pending ServerProtocolViolation 
PipelineFailure Success 
ProtocolError Timeout 
ProxyNameResoLutionFailure TrustFailure 
UnknownError 
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An invalid domain name causes a NameResolutionFailure; a dead network causes a 
ConnectFailure; a request exceeding WebRequest.Timeout milliseconds causes a 
Timeout. 


Errors such as “Page not found,’ “Moved Permanently,’ and “Not Logged In” are 
specific to the HTTP or FTP protocols, and so are all lumped together under the 
ProtocolError status. With HttpClient, these errors are not thrown unless you call 
EnsureSuccessStatusCode on the response object. Prior to doing so, you can get 
the specific status code by querying the StatusCode property: 


var client = new HttpClient(); 
var response = await client.GetAsync ("http://lingpad.net/foo"); 
HttpStatusCode responseStatus = response.StatusCode; 


With WebClient and WebRequest/WebResponse, you must actually catch the 
WebException and then: 
1. Cast the WebException’s Response property to HttpWebResponse or FtpWeb 
Response. 
2. Examine the response object’s Status property (an HttpStatusCode or 
FtpStatusCode enum) and/or its StatusDescription property (string). 
For example: 


WebClient wc = new WebClient { Proxy = null }; 
try 
{ 


string s = wc.DownloadString ("http://www.albahari.com/notthere"); 


catch (WebException ex) 


{ 
if (ex.Status == WebExceptionStatus.NameResolutionFailure) 
Console.WriteLine ("Bad domain name"); 
else if (ex.Status == WebExceptionStatus.ProtocolError) 
{ 
HttpWebResponse response = (HttpWebResponse) ex.Response; 
Console.WriteLine (response.StatusDescription) ; // "Not Found" 
if (response.StatusCode == HttpStatusCode.NotFound) 
Console.WriteLine ("Not there!"); // "Not there!" 
} 
else throw; 
} 


If you want the three-digit status code, such as 401 or 404, 
simply cast the HttpStatusCode or FtpStatusCode enum to 
an integer. By default, you'll never get a redirection error 
because WebClient and WebRequest automatically follow redi- 
rection responses. You can switch off this behavior in a WebRe 
quest object by setting Al lowAutoRedirect to false. 


The redirection errors are 301 (Moved Permanently), 302 
(Found/Redirect), and 307 (Temporary Redirect). 
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If an exception is thrown because you've incorrectly used the WebClient or 
WebRequest classes, it will more likely be an InvalidOperationException or 
ProtocolViolationException than a WebException. 


Working with HTTP 


This section describes HTTP-specific request and response features of WebClient, 
HttpWebRequest/HttpWebResponse, and the HttpClient class. 


Headers 


WebClient, WebRequest, and HttpClient all let you add custom HTTP headers as 
well as enumerate the headers in a response. A header is simply a key/value pair 
containing metadata, such as the message content type or server software. Here's 
how to add a custom header to a request and then list all headers in a response mes- 
sage in a WebClient: 


WebClient wc = new WebClient { Proxy = null }; 
wc.Headers.Add ("CustomHeader", "JustPlaying/1.0"); 
wc.DownloadString ("http://www.oreilly.com"); 


foreach (string name in wc.ResponseHeaders.Keys) 
Console.WriteLine (name + "=" + wc.ResponseHeaders [name]); 


Age=51 

X-Cache=HIT from oregano. bp 
X-Cache-Lookup=HIT from oregano. bp: 3128 
Connection=keep-alive 

Accept -Ranges=bytes 

Content -Length=95433 

Content -Type=text/htnl 


HttpClient instead exposes strongly typed collections with properties for standard 
HTTP headers. The DefaultRequestHeaders property is for headers that apply to 
every request: 


var client = new HttpClient (handler); 


client.DefaultRequestHeaders.UserAgent.Add ( 
new ProductInfoHeaderVaLlue ("VisualStudio", "2015")); 


client.DefaultRequestHeaders.Add ("CustomHeader", "VisualStudio/2015"); 


The Headers property on the HttpRequestMessage class, however, is for headers 
specific to a request. 
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Query Strings 


A query string is simply a string appended to a URI with a question mark, used to 
send simple data to the server. You can specify multiple key/value pairs in a query 
string with the following syntax: 


?key1=valuel&key2=value2&key3=value3... 


WebClient provides an easy way to add query strings through a dictionary-style 
property. The following searches Google for the word “WebClient,” displaying the 
result page in French: 


WebClient wc = new WebClient { Proxy = null }; 

wc.QueryString.Add ("q", "WebClient"); // Search for "WebClient" 
wc.QueryString.Add ("hl", "fr"); // Display page in French 
wc.DownloadFile ("http://www.google.com/search", "results.html") ; 


To achieve the same result with WebRequest or with HttpClient, you must man- 
ually append a correctly formatted string to the request URI: 


string requestURI = "http://www.google.com/search?q=WebClient&hl=fr" ; 


If there’s a possibility of your query including symbols or spaces, you can use Uri’s 
EscapeDataString method to create a legal URI: 


string search = Uri.EscapeDataString ("(WebClient OR HttpClient)"); 

string Language = Uri.EscapeDataString ("fr"); 

string requestURI = "http://www.google.com/search?q= 
"&hl=" + Language; 


" 


+ search + 


This resultant URI is: 
http: //www.google.com/search?q=(WebClient%200R%20HttpClient)&hl=fr 


(EscapeDataString is similar to EscapeUriString except that it also encodes char- 
acters such as & and =, which would otherwise mess up the query string.) 


Uploading Form Data 


WebClient provides UplLoadValues methods for posting data to an HTML form: 


WebClient wc = new WebClient { Proxy = null }; 


var data = new System.Collections.Specialized.NameValueCollection(); 
data.Add ("Name", "Joe Albahari"); 
data.Add ("Company", "O'Reilly"); 


byte[] result = wc.UploadValues ("http://www.albahari.com/EchoPost.aspx", 
"POST", data); 


Console.WriteLine (Encoding.UTF8.GetString (result)); 


The keys in the NameValueCollection, such as searchtextbox and searchMode, 
correspond to the names of input boxes on the HTML form. 
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Uploading form data is more work via WebRequest. (You'll need to take this route if 
you need to use features such as cookies.) Here's the procedure: 


1. 


Set the request’s ContentType to "application/x-www-form-urlencoded" and 
its Method to "POST". 


. Build a string containing the data to upload, encoded as follows: 


name1=valuel&name2=value2&name3=value3... 


3. Convert the string to a byte array, with Encoding.UTF8.GetBytes. 

4. Set the web request’s ContentLength property to the byte array length. 
5. 
6 


Call GetRequestStream on the web request and write the data array. 


. Call GetResponse to read the server's response. 


Here's the previous example written with WebRequest: 


var req = WebRequest.Create ("http://www.albahari.com/EchoPost.aspx"); 
req.Proxy = null; 

req.Method = "POST"; 

req.ContentType = "application/x-www-form-urlencoded"; 


string reqString = "Name=Joe+Albahari&Company=0'Reilly"; 
byte[] reqData = Encoding.UTF8.GetBytes (reqString); 
req.ContentLength = reqData.Length; 


using (Stream reqStream = req.GetRequestStream()) 
reqStream.Write (reqData, 0, reqData.Length); 


using (WebResponse res = req.GetResponse()) 

using (Stream resSteam = res.GetResponseStream()) 

using (StreamReader sr = new StreamReader (resSteam) ) 
Console.WriteLine (sr.ReadToEnd()); 


With HttpClient, you instead create and populate FormUrlEncodedContent object, 
which you can then either pass into the PostAsync method or assign to a request’s 
Content property: 


string uri = "http://www.albahari.com/EchoPost.aspx"; 
var client = new HttpClient(); 
var dict = new Dictionary<string,string> 
{ 
{ "Name", "Joe Albahari" }, 
{ "Company", "O'Reilly" } 
33 
var values = new FormUrlEncodedContent (dict); 
var response = await client.PostAsync (uri, values); 
response. EnsureSuccessStatusCode(); 
Console.WriteLine (await response.Content.ReadAsStringAsync()); 
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Cookies 


A cookie is a name/value string pair that an HTTP server sends to a client in a 
response header. A web browser client typically remembers cookies and replays 
them to the server in each subsequent request (to the same address) until their 
expiry. A cookie allows a server to know whether it’s talking to the same client it was 
a minute ago—or yesterday—without needing a messy query string in the URI. 


By default, HttpWebRequest ignores any cookies received from the server. To accept 
cookies, create a CookieContainer object and assign it to the WebRequest. The 
cookies received in a response can then be enumerated: 


var cc = new CookieContainer(); 


var request = (HttpWebRequest) WebRequest.Create ("http://www.google.com"); 
request.Proxy = null; 

request.CookieContainer = cc; 

using (var response = (HttpWebResponse) request.GetResponse()) 


{ 
foreach (Cookie c in response.Cookies) 
{ 
Console.WriteLine (" Name: "+ c.Name); 
Console.WriteLine (" Value: " + c.Value); 
Console.WriteLine (" Path: "+ ¢.Path); 
Console.WriteLine (" Domain: " + c.Domain); 
} 
// Read response stream... 
} 
Name: PREF 
Value: ID=6b10df1da493a9c4: TM=1179025486: LM=1179025486 : S=EJCZriQaWEHLk4tt 
Path: / 


Domain: .google.com 
To do the same with HttpClient, first instantiate an HttpClientHandler: 


var cc = new CookieContainer(); 

var handler = new HttpClientHandler(); 
handler.CookieContainer = cc; 

var client = new HttpClient (handler); 


The WebClient facade class does not support cookies. 


To replay the received cookies in future requests, simply assign the same Cookie 
Container object to each new WebRequest object, or with HttpClient, keep using 
the same object to make requests. CookieContainer is serializable, so it can be writ- 
ten to disk—see Chapter 17. Alternatively, you can start with a fresh Cookie 
Container, and then add cookies manually, as follows: 


Cookie c = new Cookie ("PREF", 
"TD=6b10df1da493a9c4:TM=1179...", 


res 
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", google.com"); 
freshCookieContainer.Add (c); 


The third and fourth arguments indicate the path and domain of the originator. A 
CookieContainer on the client can house cookies from many different places; Web 
Request sends only those cookies whose path and domain match those of the server. 


Writing an HTTP Server 


You can write your own .NET HTTP server with the HttpListener class. The fol- 
lowing is a simple server that listens on port 51111, waits for a single client request, 
and then returns a one-line reply: 


static void Main() 


{ 


using var server = new SimpleHttpServer(); 


// Make a client request: 
Console.WriteLine (new WebClient().DownloadString 
("http://localhost:51111/MyApp/Request.txt")); 


} 


class SimpleHttpServer : IDisposable 
{ 


readonly HttpListener listener = new HttpListener(); 


public SimpleHttpServer() => ListenAsync(); 

async void ListenAsync() 

{ 
listener .Prefixes.Add ("http://localhost:51111/MyApp/"); // Listen on 
listener .Start(); // port 51111 


// Await a client request: 
HttpListenerContext context = await listener.GetContextAsync(); 


// Respond to the request: 

string msg = "You asked for: " + context.Request.RawUrl; 
context.Response.ContentLength64 = Encoding.UTF8.GetByteCount (msg); 
context.Response.StatusCode = (int)HttpStatusCode.0OK; 


" 


using (Stream s = context.Response.OutputStream) 
using (StreamWriter writer = new StreamWriter (s)) 
await writer.WriteAsync (msg); 


} 


public void Dispose() => listener.Close(); 


} 


OUTPUT: You asked for: /MyApp/Request.txt 


On Windows, HttpListener does not internally use .NET Socket objects; it instead 
calls the Windows HTTP Server API. This allows many applications on a computer 
to listen on the same IP address and port—as long as each registers different address 
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prefixes. In our example, we registered the prefix http://localhost/myapp, so another 
application would be free to listen on the same IP and port on another prefix such 
as http://localhost/anotherapp. This is of value because opening new ports on corpo- 
rate firewalls can be politically arduous. 


HttpListener waits for the next client request when you call GetContext, returning 
an object with Request and Response properties. Each is analogous to a WebRequest 
and WebResponse object, but from the server’s perspective. You can read and write 
headers and cookies, for instance, to the request and response objects, much as you 
would at the client end. 


You can choose how fully to support features of the HTTP protocol, based on your 
anticipated client audience. At a bare minimum, you should set the content length 
and status code on each request. 


Here’ a very simple web page server, written asynchronously: 


using System; 

using System.10; 

using System.Net; 

using System.Text; 

using System. Threading.Tasks; 


class WebServer 
t 
HttpListener _listener; 
string _baseFolder; // Your web page folder 


public WebServer (string uriPrefix, string baseFolder) 
{ 
_listener = new HttpListener(); 
_listener.Prefixes.Add (uriPrefix); 
_baseFolder = baseFolder; 


} 


public async void Start() 
{ 
_listener.Start(); 
while (true) 
try 
{ 
var context = await _listener.GetContextAsync(); 
Task.Run (() => ProcessRequestAsync (context)); 
} 
catch (HttpListenerException) { break; } // Listener stopped. 
catch (InvalidOperationException) { break; } // Listener stopped. 
} 


public void Stop() => _listener.Stop(); 


async void ProcessRequestAsync (HttpListenerContext context) 


{ 
try 


{ 
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string filename = Path.GetFileName (context.Request.RawUrlL); 
string path = Path.Combine (_baseFolder, filename); 
byte[] msg; 
if (!File.Exists (path)) 
{ 
Console.WriteLine ("Resource not found: + path); 
context.Response.StatusCode = (int) HttpStatusCode.NotFound; 
msg = Encoding.UTF8.GetBytes ("Sorry, that page does not exist"); 
} 
else 
{ 
context.Response.StatusCode = (int) HttpStatusCode.0K; 
msg = File.ReadAlLlBytes (path); 
} 
context.Response.ContentLength64 = msg.Length; 
using (Stream s = context.Response.OutputStream) 
await s.WriteAsync (msg, 0, msg.Length); 


} 


catch (Exception ex) { Console.WriteLine ("Request error: 


} 
} 


Here's a main method to set things in motion: 


+ ex); } 


static void Main() 


{ 


// Listen on port 51111, serving files in d:\webroot: 
var server = new WebServer ("http://localhost:51111/", @"d:\webroot"); 
try 
{ 
server.Start(); 
Console.WriteLine ("Server running... press Enter to stop"); 
Console.ReadLine(); 


} 
finally { server.Stop(); } 


} 


You can test this at the client end with any web browser; the URI in this case will be 
http://localhost:51111/ plus the name of the web page. 


HttpListener will not start if other software is competing for 
the same port (unless that software also uses the Windows 
HTTP Server API). Examples of applications that might listen 
on the default port 80 include a web server or a peer-to-peer 
program such as Skype. 


Our use of asynchronous functions makes this server scalable and efficient. Starting 
this from a user interface (UI) thread, however, would hinder scalability because for 
each request, execution would bounce back to the UI thread after each await. Incur- 
ring such overhead is particularly pointless given that we don't have shared state, so 
in a UI scenario wed get off the UI thread, either like this: 


Task.Run (Start); 


or by calling ConfigureAwait(false) after calling GetContextAsync. 
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Note that we used Task.Run to call ProcessRequestAsync even though the method 
was already asynchronous. This allows the caller to process another request immedi- 
ately rather than having to first wait out the synchronous phase of the method (up 
until the first await). 


Using FTP 


For simple FTP upload and download operations, you can use WebClient, as we did 
previously: 


WebClient wc = new WebClient { Proxy = null }; 

wc.Credentials = new NetworkCredential ("myuser", "mypassword"); 
wc.BaseAddress = "ftp://ftp.myserver.com"; 

wc.UploadString ("tempfile.txt", "hello!"); 

Console.WriteLine (wc.DownloadString ("tempfile.txt")); // hello! 


There’s more to FTP, however, than just uploading and downloading files. The pro- 
tocol also defines a set of commands or “methods,” which are exposed as string con- 
stants in WebRequestMethods.Ftp: 


AppendFile MakeDirectory 

DeleteFile PrintWorkingDirectory 
DownLloadFile RemoveDirectory 
GetDateTimestamp Rename 

GetFileSize UploadFile 

ListDirectory UpLoadFileWithUniqueName 
ListDirectoryDetails 


To run one of these commands, you assign its string constant to the web request’s 
Method property and then call GetResponse( ). Here's how to get a directory listing: 


var req = (FtpWebRequest) WebRequest.Create ("ftp://ftp.myserver.com"); 
req.Proxy = null; 

req.Credentials = new NetworkCredential ("myuser", "mypassword"); 
req.Method = WebRequestMethods.Ftp.ListDirectory; 


using (WebResponse resp = req.GetResponse()) 
using (StreamReader reader = new StreamReader (resp.GetResponseStream())) 
Console.WriteLine (reader .ReadToEnd()); 


RESULT: 


guestbook. txt 
tempfile. txt 
test.doc 


In the case of getting a directory listing, we needed to read the response stream to 
get the result. Most other commands, however, don’t require this step. For instance, 
to get the result of the GetFileSize command, just query the response’s Content 
Length property: 
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var req = (FtpWebRequest) WebRequest.Create ( 

"ftp: //ftp.myserver.com/tempfile.txt"); 
req.Proxy = null; 
req.Credentials = new NetworkCredential ("myuser", "mypassword"); 


req.Method = WebRequestMethods.Ftp.GetFileSize; 


using (WebResponse resp = req.GetResponse()) 
Console.WriteLine (resp.ContentLength) ; // 6 


The GetDateTimestamp command works in a similar way except that you query the 
response’s LastModified property. This requires that you cast to FtpWebResponse: 


req.Method = WebRequestMethods.Ftp.GetDateTimestamp; 


using (var resp = (FtpWebResponse) req.GetResponse() ) 
Console.WriteLine (resp.LastModified) ; 


To use the Rename command, you must populate the request’s RenameTo property 
with the new filename (without a directory prefix). For example, to rename a file in 
the incoming directory from tempfile.txt to deleteme. txt: 


var req = (FtpWebRequest) WebRequest.Create ( 

"ftp: //ftp.myserver.com/tempfile.txt"); 
req.Proxy = null; 
req.Credentials = new NetworkCredential ("myuser", "mypassword"); 


req.Method = WebRequestMethods.Ftp.Rename; 
req.RenameTo = "deleteme. txt"; 


req.GetResponse().Close(); // Perform the rename 
Here's how to delete a file: 


var req = (FtpWebRequest) WebRequest.Create ( 

"ftp: //ftp.myserver.com/deleteme.txt"); 
req.Proxy = null; 
req.Credentials = new NetworkCredential ("myuser", "mypassword"); 


req.Method = WebRequestMethods.Ftp.DeleteFile; 


req.GetResponse().Close(); // Perform the deletion 
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In all these examples, you would typically use an exception- 
handling block to catch network and protocol errors. A typical 
catch block looks like this: 


catch (WebException ex) 


{ 


if (ex.Status == WebExceptionStatus.ProtocolError) 
{ 
// Obtain more detail on error: 
var response = (FtpWebResponse) ex.Response; 
FtpStatusCode errorCode = response. StatusCode; 
string errorMessage = response.StatusDescription; 


Using DNS 


The static Dns class encapsulates the DNS, which converts between a raw IP address, 
such as 66.135.192.87, and a human-friendly domain name, such as ebay.com. 


The GetHostAddresses method converts from domain name to IP address (or 
addresses): 


foreach (IPAddress a in Dns.GetHostAddresses ("albahari.com")) 
Console.WriteLine (a.ToString()); // 205.210.42.167 


The GetHostEntry method goes the other way around, converting from address to 
domain name: 


IPHostEntry entry = Dns.GetHostEntry ("205.210.42.167"); 
Console.WriteLine (entry.HostName) ; // albahari.com 


GetHostEntry also accepts an IPAddress object, so you can specify an IP address as 
a byte array: 


IPAddress address = new IPAddress (new byte[] { 205, 210, 42, 167 }); 
IPHostEntry entry = Dns.GetHostEntry (address); 
Console.WriteLine (entry.HostName) ; // albahari.com 


Domain names are automatically resolved to IP addresses when you use a class such 
as WebRequest or TcpClient. However, if you plan to make many network requests 
to the same address over the life of an application, you can sometimes improve per- 
formance by first using Dns to explicitly convert the domain name into an IP 
address, and then communicating directly with the IP address from that point on. 
This avoids repeated round-tripping to resolve the same domain name, and it can 
be of benefit when dealing at the transport layer (via TcpClient, UdpClient, or 
Socket). 


The DNS class also provides awaitable task-based asynchronous methods: 


foreach (IPAddress a in await Dns.GetHostAddressesAsync ("albahari.com")) 
Console.WriteLine (a.ToString()); 
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Sending Mail with SmtpClient 


The SmtpClient class in the System.Net.Mail namespace allows you to send mail 
messages through the ubiquitous Simple Mail Transfer Protocol, or SMTP. To send 
a simple text message, instantiate SmtpClient, set its Host property to your SMTP 


server address, and then call Send: 


SmtpClient client = new SmtpClient(); 
client.Host = "mail.myserver.com"; 
client.Send ("from@adomain.com", "to@adomain.com", "subject", "body"); 


Constructing a MailMessage object exposes further options, including the ability to 
add attachments: 


SmtpClient client = new SmtpClient(); 


client.Host 


= "mail.myisp.net"; 


("kay@domain.com", 
("kay@domain.com", 


MailMessage mm = new MailMessage(); 
mm.Sender = new MailAddress 
mm.From = new MailAddress 
mm.To.Add (new MailAddress 


mm. 
mm. 
.Body = "Hi there. Here's the photo!"; 
mm. 
.Priority = MailPriority.High; 


mm 


mm 


CC.Add (new MailAddress 
Subject = "Hello!"; 


IsBodyHtml = false; 


("bob@domain.com", 
("dan@domain.com", 


Attachment a = new Attachment ("photo. jpg", 


mn. 


Attachments.Add (a); 


client.Send (mm); 


System.Net.Mime. 


Kay"); 
Kay"); 

"Bob")); 
"Dan")); 


MediaTypeNames. Image. Jpeg); 


To frustrate spammers, most SMTP servers on the internet will accept connections 
only from authenticated connections and require communication over SSL: 


var client = new SmtpClient ("smtp.myisp.com", 587) 


{ 


3; 


Credentials = new NetworkCredential ("me@myisp.com", "MySecurePass"), 
EnableSsl = true 


client.Send ("me@myisp.com", "someone@somewhere.com", "Subject", "Body"); 
Console.WriteLine ("Sent"); 


By changing the DeliveryMethod property, you can instruct the SmtpClient to 
instead use IIS to send mail messages or simply to write each message to an .eml file 
in a specified directory. This can be useful during development: 


SmtpClient client = new SmtpClient(); 
client.DeliveryMethod = SmtpDeliveryMethod.SpecifiedPickupDirectory; 
client.PickupDirectoryLocation = @"c:\mail"; 
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Using TCP 


TCP and UDP constitute the transport layer protocols on top of which most inter- 
net—and LAN—services are built. HTTP (version 2 and below), FTP, and SMTP 
use TCP; DNS and HTTP version 3 use UDP. TCP is connection-oriented and 
includes reliability mechanisms; UDP is connectionless, has a lower overhead, and 
supports broadcasting. BitTorrent uses UDP, as does Voice over IP (VoIP). 


The transport layer offers greater flexibility—and potentially improved performance 
—over the higher layers, but it requires that you handle such tasks as authentication 
and encryption yourself. 


With TCP in .NET Core, you have a choice of either the easier-to-use TcpClient 
and TcpListener facade classes, or the feature-rich Socket class. (In fact, you can 
mix and match, because TcpClient exposes the underlying Socket object through 
the Client property.) The Socket class exposes more configuration options and 


allows direct access to the network layer (IP) and non-internet-based protocols such 
as Novell’s SPX/IPX. 


(TCP and UDP communication is also possible via WinRT types: see “TCP in 
UWP?” on page 722.) 


As with other protocols, TCP differentiates a client and server: the client initiates a 
request, while the server waits for a request. Here's the basic structure for a synchro- 
nous TCP client request: 


using (TcpClient client = new TcpClient()) 
{ 


client.Connect ("address", port); 
using (NetworkStream n = client.GetStream()) 


{ 


// Read and write to the network stream... 
} 
} 


TcpClient’s Connect method blocks until a connection is established (Connect 
Async is the asynchronous equivalent). The NetworkStream then provides a means 
of two-way communication, for both transmitting and receiving bytes of data from 
a server. 


A simple TCP server looks like this: 


TcpListener listener = new TcpListener (<ip address>, port); 
listener.Start(); 


while (keepProcessingRequests) 
using (TcpClient c = listener.AcceptTcpClient()) 
using (NetworkStream n = c.GetStream()) 


{ 


// Read and write to the network stream... 


} 


listener.Stop(); 
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TcpListener requires the local IP address on which to listen (a computer with two 
network cards, for instance, can have two addresses). You can use IPAddress.Any to 
instruct it to listen on all (or the only) local IP addresses. AcceptTcpClient blocks 
until a client request is received (again, there’s also an asynchronous version), at 
which point we call GetStreanm, just as on the client side. 


When working at the transport layer, you need to decide on a protocol for who talks 
when, and for how long—rather like with a walkie-talkie. If both parties talk or lis- 
ten at the same time, communication breaks down! 


Let’s invent a protocol in which the client speaks first, saying “Hello,” and then the 
server responds by saying “Hello right back!” Here’s the code: 


using System; 

using System.10; 

using System.Net; 

using System.Net.Sockets; 
using System. Threading; 


class TcpDemo 


{ 
static void Main() 
{ 
new Thread (Server).Start(); // Run server method concurrently. 
Thread.Sleep (500); // Give server time to start. 
Client(); 
} 


static void Client() 
{ 
using (TcpClient client = new TcpClient ("Localhost", 51111)) 
using (NetworkStream n = client.GetStream()) 
{ 
BinaryWriter w = new BinaryWriter (n); 
w.Write ("Hello"); 


w.Flush(); 
Console.WriteLine (new BinaryReader (n).ReadString()); 
} 
} 
static void Server() // Handles a single client request, then exits. 
{ 


TcpListener listener = new TcpListener (IPAddress.Any, 51111); 
listener .Start(); 
using (TcpClient c = listener.AcceptTcpClient()) 
using (NetworkStream n = c.GetStream()) 
t 

string msg = new BinaryReader (n).ReadString(); 

BinaryWriter w = new BinaryWriter (n); 

w.Write (msg + " right back!"); 

w.Flush(); // Must call Flush because we're not 
} // disposing the writer. 
listener .Stop(); 
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} 


// OUTPUT: Hello right back! 


In this example, we're using the localhost loopback to run the client and server on 
the same machine. We've arbitrarily chosen a port in the unallocated range (above 
49152) and used a BinaryWriter and BinaryReader to encode the text messages. 
We've avoided closing or disposing the readers and writers in order to keep the 
underlying NetworkStream open until our conversation completes. 


BinaryReader and BinaryWriter might seem like odd choices for reading and writ- 
ing strings. However, they have a major advantage over StreamReader and Stream 
Writer: they prefix strings with an integer indicating the length, so a BinaryReader 
always knows exactly how many bytes to read. If you call StreamReader .ReadToEnd 
you might block indefinitely—because a NetworkStream doesn't have an end! As 
long as the connection is open, the network stream can never be sure that the client 
isn’t going to send more data. 


StreamReader is in fact completely out of bounds with 
NetworkStrean, even if you plan only to call ReadLine. This is 
because StreamReader has a read-ahead buffer, which can 
result in it reading more bytes than are currently available, 
blocking indefinitely (or until the socket times out). Other 
streams such as FileStream don't suffer this incompatibility 
with StreamReader because they have a definite end—at which 
point Read returns immediately with a value of 0. 


Concurrency with TCP 


TcpClient and TcpListener offer task-based asynchronous methods for scalable 
concurrency. Using these is simply a question of replacing blocking method calls 
with their *Async versions, and awaiting the task that’s returned. 


In the following example, we write an asynchronous TCP server that accepts 
requests of 5,000 bytes in length, reverses the bytes, and then sends them back to the 
client: 


async void RunServerAsync () 


{ 
var listener = new TcpListener (IPAddress.Any, 51111); 


listener.Start (); 
try 
{ 
while (true) 
Accept (await listener.AcceptTcpClientAsync ()); 
} 
finally { listener.Stop(); } 
} 


async Task Accept (TcpClient client) 


{ 
await Task.Yield (); 
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try 
{ 
using (client) 
using (NetworkStream n = 


{ 


byte[] data = new byte [5000]; 


client.GetStream ()) 


int bytesRead = 0; int chunkSize = 1; 
while (bytesRead < data.Length && chunkSize > 0) 


bytesRead += chunkSize = 


await n.ReadAsync (data, bytesRead, data.Length - bytesRead); 


Array.Reverse (data); 


// Reverse the byte sequence 


await n.WriteAsync (data, 0, data.Length); 


} 
} 


catch (Exception ex) { Console.WriteLine (ex.Message); } 


} 


Such a program is scalable in that it does not block a thread for the duration of a 
request. So, if 1,000 clients were to connect at once over a slow network connection 
(so that each request took several seconds from start to finish, for example), this 
program would not require 1,000 threads for that time (unlike with a synchronous 
solution). Instead, it leases threads only for the small periods of time required to 


execute code before and after the await expressions. 


Receiving POP3 Mail with TCP 


.NET Core provides no application-layer support for POP3, so you need to write at 
the TCP layer in order to receive mail from a POP3 server. Fortunately, this is a sim- 


ple protocol; a POP3 conversation goes like this: 


Client Mail server 
Client connects... +0K Hello there. 
USER joe +0K Password required. 


PASS password +0K Logged in. 


LIST +0K 
1 1876 
2 5412 
3 845 
RETR 1 +0K 1876 octets 
Content of message #1... 
DELE 1 +0K Deleted. 
QUIT +0K Bye-bye. 


Notes 


Welcome message 


Lists the ID and file size of each message on the 
server 


Retrieves the message with the specified ID 


Deletes a message from the server 
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Each command and response is terminated by a newline (CR + LF) except for the 
multiline LIST and RETR commands, which are terminated by a single dot on a sepa- 
rate line. Because we can’t use StreamReader with NetworkStream, we can start by 
writing a helper method to read a line of text in a nonbuffered fashion: 


static string ReadLine (Stream s) 
{ 
List<byte> lineBuffer = new List<byte>(); 
while (true) 
{ 
int b = s.ReadByte(); 
if (b == 10 || b < 0) break; 
if (b != 13) lineBuffer.Add ((byte)b); 
} 
return Encoding.UTF8.GetString (lineBuffer.ToArray()); 
} 


We also need a helper method to send a command. Because we always expect to 
receive a response starting with +OK, we can read and validate the response at the 
same time: 


static void SendCommand (Stream stream, string line) 
{ 
byte[] data = Encoding.UTF8.GetBytes (line + "\r\n"); 
stream.Write (data, 0, data.Length); 
string response = ReadLine (stream); 
if (!response.StartsWith ("+0K")) 
throw new Exception ("POP Error: 


+ response); 


} 


With these methods written, the job of retrieving mail is easy. We establish a TCP 
connection on port 110 (the default POP3 port), and then start talking to the server. 
In this example, we write each mail message to a randomly named file with an .eml 
extension, before deleting the message off the server: 


using (TcpClient client = new TcpClient ("mail.isp.com", 110)) 

using (NetworkStream n = client.GetStream()) 

{ 
ReadLine (n); // Read the welcome message. 
SendCommand (n, "USER username"); 
SendCommand (n, "PASS password"); 
SendCommand (n, "LIST"); // Retrieve message IDs. 
List<int> messageIDs = new List<int>(); 
while (true) 


{ 
string line = ReadLine (n); // e.g., "1 1876" 
if (line == ".") break; 
messageIDs.Add (int.Parse (line.Split (' ')[0] )); // Message ID 
} 
foreach (int id in messageIDs) // Retrieve each message. 
{ 


SendCommand (n, "RETR " + id); 
string randomFile = Guid.NewGuid().ToString() + ".emL"; 
using (StreamWriter writer = File.CreateText (randomFile) ) 
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while (true) 


{ 
string line = ReadLine (n); // Read next line of message. 
if (line == ".") break; // Single dot = end of message. 
if (line == "..") line = "."; // "Escape out" double dot. 
writer.WriteLine (line); // Write to output file. 
} 
SendCommand (n, "DELE " + id); // Delete message off server. 


SendCommand (n, "QUIT"); 
} 


You can find open source POP3 libraries on NuGet that pro- 
vide support for protocol aspects such as authentication 
TLS/SSL connections, MIME parsing, and more. 


TCP in UWP 


In UWP applications, TCP functionality is exposed through WinRT types in the 
Windows .Networking. Sockets namespace. As with the .NET implementation, there 
are two primary classes to handle server and client roles, StreamSocketListener 
and StreamSocket. 


Your application manifest must declare the capability Internet 
(Client) if the host is on the internet or Private Networks (Cli- 
ent & Server) if the host is private (including localhost). 


The following method starts a server on port 51111 and waits for a client to con- 
nect. It then reads a single message comprising a length-prefixed string: 


async void Server() 

{ 
var listener = new StreamSocketListener(); 
listener.ConnectionReceived += async (sender, args) => 


{ 


using (StreamSocket socket = args.Socket) 
{ 
var reader = new DataReader (socket. InputStream) ; 
await reader.LoadAsync (4); 
uint Length = reader .ReadUInt32(); 
await reader.LoadAsync (length); 
Debug.WriteLine (reader.ReadString (length)); 
} 


listener.Dispose(); // Close listener after one message. 
}; 
await listener.BindServiceNameAsync ("51111"); 


} 


In this example, we used a WinRT type called DataReader (in Windows .Networking) 
to read from the input stream, rather than converting to a .NET Stream object and 
using a BinaryReader. DataReader is rather like BinaryReader except that it sup- 
ports asynchrony. The LoadAsync method asynchronously reads a specified number 





722 | Chapter 16: Networking 


of bytes into an internal buffer, which then allows you to call methods such as 
ReadUInt32 or ReadString. The idea is that if you wanted to, say, read 1,000 inte- 
gers in a row, youd first call LoadAsync with a value of 4000, and then ReadInt32 
1,000 times in a loop. This avoids the overhead of calling asynchronous operations 
in a loop (because each asynchronous operation incurs a small overhead). 


DataReader/DataWriter have a ByteOrder property to control 
whether numbers are encoding in big- or little-endian format. 
Big-endian is the default. 


The StreamSocket object that we obtained from awaiting AcceptAsync has separate 
input and output streams. So, to write a message back, wed use the socket’s Output 
Stream. We can illustrate the use of OutputStream and DataWriter with the corre- 
sponding client code: 


async void Client() 


{ 


using (var socket = new StreamSocket()) 


{ 


await socket.ConnectAsync (new HostName ("localhost"), "51111", 
SocketProtectionLevel.PlainSocket); 

var writer = new DataWriter (socket.OutputStream) ; 

string message = "Hello!"; 

uint length = (uint) Encoding.UTF8.GetByteCount (message); 

writer .WriteUInt32 (length); 

writer.WriteString (message); 

await writer.StoreAsync(); 


t 
i 
We start by directly instantiating a StreamSocket and then call ConnectAsync with 
the host name and port. (You can pass either a DNS name or an IP address string 
into HostName’s constructor.) By specifying SocketProtectionLevel.Ssl, you can 
request SSL encryption (if configured on the server). 


Again, we used a WinRT DataWriter rather than a .NET BinaryWriter and wrote 
the length of the string (measured in bytes rather than characters), followed by the 
string itself, which is UTF-8 encoded. Finally, we called StoreAsync, which writes 
the buffer to the backing stream, and closed the socket. 
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Serialization 








This chapter introduces serialization and deserialization, the mechanism by which 
objects can be represented in a flat-text or binary form. Unless otherwise stated, the 
types in this chapter all exist in the following namespaces: 


System.Runtime.Serialization 
System.Xml.Serialization 
System. Text.Json 


We cover the data contract serializer in an online supplement. 


Serialization Concepts 


Serialization is the act of taking an in-memory object or object graph (set of objects 
that reference one another) and flattening it into a stream of bytes, XML, JSON, or a 
similar representation that can be stored or transmitted. Deserialization works in 
reverse, taking a data stream and reconstituting it into an in-memory object or 
object graph. 


Serialization and deserialization are typically used to do the following: 


¢ Transmit objects across a network or application boundary 


¢ Store representations of objects within a file or database 


Another, less common use is to deep-clone objects. You also can use the data con- 
tract and XML serialization engines as general-purpose tools for loading and saving 
XML files of a known structure, whereas the JSON serializer can do the same for 
JSON files. 


.NET Core supports serialization and deserialization both from the perspective of 
clients wanting to serialize and deserialize objects, and from the perspective of types 
wanting some control over how they are serialized. 
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Serialization Engines 


There are four serialization engines in .NET Core: 


e XmlSerializer (XML) 
e JsonSerializer (JSON) 
e The (somewhat redundant) data contract serializer (XML and JSON) 


e The binary serializer (binary) 


If youre serializing to XML, you can choose between XmlSerializer and the data 
contract serializer. XmlSerializer offers greater flexibility on how the XML is 
structured, whereas the data contract serializer has the unique ability to preserve 
shared object references. 


If you're serializing to JSON, you also have a choice. JsonSerializer offers the best 
performance, whereas the data contract serializer has a few extra features due to its 
longer heritage. However, if you need extra features, a better choice is likely to be 
the third-party Json.NET library. 


If you need to interoperate with legacy SOAP-based web services, the data contract 
serializer is the best choice. 


And if you don't care about the format, the binary serialization engine is the most 
powerful and easiest to use. The output, however, is not human-readable and it’s less 
version-tolerant than the other serializers. 


Table 17-1 compares each of the engines. More stars equate to a better score. 


Table 17-1. Serialization engine comparison 





Feature XmlSerializer JsonSerializer Data contract Binary 
serializer serializer 
Output XML JSON XML or JSON Binary 
Type coupling Loose Loose Loose Tight 
Can deserialize subtypes With help No With help Yes 
Preserves object references No No With XML Yes 
Can serialize nonpublic fields No No Yes Yes 
Suitable for interoperable Yes Yes Yes No 
messaging 
Flexibility in output format a ae * - 
Compact output a “ne ba re 
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Note that the XML serialization engine requires that you recycle the same 
XmlSerializer object for good performance. 


Why four engines? 


The reason for there being four engines is partly historical. The NET Framework 
originally started out with two distinct goals in serialization: 


¢ Serializing NET object graphs with full type and reference fidelity 
¢ Interoperating with XML and SOAP messaging standards 


The first led to the binary serializer (which was used by .NET Remoting); the sec- 
ond led to the XmlSerializer (which was used by ASMX web services). 


With the release of Windows Communication Foundation (WCF) in 2006, a new 
serialization engine was required—the data contract serializer—and it was hoped 
that the new engine could largely replace the older two. However, because its design 
focused heavily on features relevant to interoperable messaging, it never fully 
achieved this goal, and the two older engines remained useful. 


WCF was designed to be format-neutral, but in practice it was shaped by needs of 
complex SOAP protocols, which later lost popularity in favor of REST and JSON. 
This led, at first, to Microsoft adding JSON support to the data contract serializer, 
but eventually to the demise of WCF and its exclusion from .NET Core 3. The data 
contract serializer remains in .NET Core, although the exclusion of WCF has 
diminished its role, as has Microsoft's addition of JsonSerializer to .NET Core 3. 
It's expected that JsonSerializer will be enhanced in future .NET Core releases, 
further reducing the role of the data contract serializer. 


XmlSerializer 


The XML serialization engine can produce only XML, and it is less powerful than 
the binary and data contract serializers in saving and restoring a complex object 
graph (it cannot restore shared object references). It’s the most flexible of the four, 
however, in following an arbitrary output structure. For instance, you can choose 
whether properties are serialized to elements or attributes and the handling of a col- 
lection’s outer element. The XML engine also provides excellent version tolerance. 
XmlSerializer was used by the legacy ASMX web services. 


JsonSerializer 


The JSON serializer is fast and efficient, and was introduced relatively recently 
to .NET Core. It also offers good version tolerance and allows the use of custom 
converters for flexibility. JsonSerializer is used by ASP.NET Core 3, removing the 
dependency on Json.NET, though it is straightforward to opt back in to Json.NET 
should its features be required. 
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The data contract serializer 


The data contract serializer supports a data contract model that helps you decouple 
the low-level details of the types you want to serialize from the structure of the seri- 
alized data. This provides excellent version tolerance, meaning you can deserialize 
data that was serialized from an earlier or later version of a type. You can even dese- 
rialize types that have been renamed or moved to a different assembly. 


The data contract serializer can cope with most object graphs, although it can 
require more assistance than the binary serializer. You also can use it as a general- 
purpose tool for reading/writing XML files, if you're flexible on how the XML is 
structured. (If you need to store data in attributes or cope with XML elements pre- 
senting in an arbitrary order, you cannot use the data contract serializer.) 


We cover the data contract serializer in an online supplement. 


The binary serializer 


The binary serialization engine is easy to use, highly automatic, and well supported 
throughout .NET Core 3 (and even more so in .NET Framework). Quite often, a 
single attribute is all that’s required to make a complex type fully serializable. The 
binary serializer is also faster than the data contract serializer when full type fidelity 
is needed. However, it tightly couples a type’s internal structure to the format of the 
serialized data, resulting in poor version tolerance (although it can tolerate the sim- 
ple addition of a field). The binary engine emits only binary data; it cannot produce 
XML or JSON in .NET Core. (In .NET Framework, there's a formatter for SOAP- 
based messaging that provides limited XML support.) 


The IXmlSerializable hook 


For complex XML serialization tasks, you can implement IXmlSerializable and do 
the serialization yourself with an XmlReader and XmlWriter. The IXmlSerializable 
interface is recognized both by XmlSerializer and by the data contract serializer, so 
you can use it selectively to handle the more complicated types. We describe 
XmLReader and XmlWriter in detail in Chapter 11. 


Formatters 


The output of the data contract and binary serializers is shaped by a pluggable for- 
matter. The role of a formatter is the same with both serialization engines, although 
they use completely different classes to do the job. 


A formatter shapes the final presentation to suit a particular medium or context of 
serialization. In .NET Core, the data contract serializer lets you choose between 
XML and JSON formatters, and in .NET Framework you can also choose a binary 
formatter. A binary formatter is designed to work in a context for which an arbi- 
trary stream of bytes will do—typically a file/stream or proprietary messaging 
packet. Binary output is usually smaller than XML or JSON. 
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The binary serializer offers only a binary formatter in .NET Core (in .NET Frame- 
work, there’s also a SOAP formatter for XML-based messaging). 

Explicit Versus Implicit Serialization 

Serialization and deserialization can be initiated in two ways. 


The first is explicitly, by requesting that a particular object be serialized or deserial- 
ized. When you serialize or deserialize explicitly, you choose both the serialization 
engine and the formatter. 


In contrast, implicit serialization is initiated by .NET. This happens when: 


e A serializer recursively serializes a child object. 


e You use a feature that relies on serialization, such as Web API. 


Web API can work with either XML or JSON serialization. 


Implicit serialization is less prevalent in .NET Core than in .NET Framework, which 
includes WCF (implicitly using the data contract serializer), Remoting (implicitly 
using the binary serialization engine), and ASMX Web Services (implicitly using 
XmlSerializer). 


The XML Serializer 


The XmlSerializer class in the System.Xml.Serialization namespace serializes 
and deserializes based on attributes in your classes. 


Getting Started with Attribute-Based Serialization 


To use XmlSerializer, you instantiate it and call Serialize or Deserialize with a 
Stream and object instance. To illustrate, suppose we define the following class: 


public class Person 


{ 
public string Name; 
public int Age; 

} 


The following saves a Person to an XML file and then restores it: 


Person p = new Person(); 
p.Name = "Stacey"; p.Age = 30; 


var xs = new XmlSerializer (typeof (Person)); 
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using (Stream s = File.Create ("person.xml")) 
xs.Serialize (s, p); 





Person p2; 
using (Stream s = File.OpenRead ("person.xml")) 
p2 = (Person) xs.Deserialize (s); 
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Console.WriteLine (p2.Name + " " + p2.Age); // Stacey 30 


Serialize and Deserialize can work with a Stream, XmlWriter/XmlReader, or 
TextWriter/TextReader. Here's the resultant XML: 


<?xml verston="1.0"?> 
<Person xmlns:xsi="http: //www.w3.org/2001/XMLSchema-instance" 
xmins:xsd="http: //www.w3.org/2001/XMLSchema"> 
<Name>Stacey</Name> 
<Age>30</Age> 
</Person> 
XmlSerializer can serialize types without any attributes—such as our Person type. 
By default, it serializes all public fields and properties on a type. You can exclude 
members that you don't want serialized by applying the XmlIgnore attribute: 


public class Person 


{ 


[XmlIgnore] public DateTime DateOfBirth; 
} 


XmlSerializer relies on a parameterless constructor for deserialization, throwing 
an exception if one is not present. (In our example, Person has an implicit parame- 
terless constructor.) This also means that field initializers execute prior to 
deserialization: 


public class Person 


{ 


public bool Valid = true; // Executes before deserialization 


} 


Although Xm1Serializer can serialize almost any type, it recognizes the following 
types and treats them specially: 
¢ The primitive types, DateTime, TimeSpan, Guid, and nullable versions 
e byte[] (which is converted to base 64) 
e An XmlAttribute or XmLELement (whose contents are injected into the stream) 
e Any type implementing IXmlSerializable 
e Any collection type 


The deserializer is version tolerant: it doesn’t complain if elements or attributes are 
missing or if superfluous data is present. 


Attributes, names, and namespaces 


By default, fields and properties serialize to an XML element. You can request an 
XML attribute be used, instead, as follows: 


[XmlAttribute] public int Age; 
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You can control an element or attribute’s name as follows: 


public class Person 


{ 
[XmlElement ("FirstName")] public string Name; 


[XmlAttribute ("RoughAge")] public int Age; 
} 


Here’s the result: 


<Person RoughAge="30" ...> 
<FirstName>Stacey</FirstName> 
</Person> 


The default XML namespace is blank. To specify an XML namespace, [XmlElement ] 
and [XmlAttribute] both accept a Namespace argument. You can also assign a 
name and namespace to the type itself with [XmlRoot]: 


[XmlLRoot ("Candidate", Namespace = "http://mynamespace/test/")] 
public class Person { ... } 


This names the person element “Candidate” as well as assigning a namespace to this 
element and its children. 


XML element order 


XmlSerializer writes elements in the order in which they're defined in the class. 
You can change this by specifying an Order in the XmlELement attribute: 


public class Person 


{ 
[XmlElement (Order = 2)] public string Name; 


[XmlElement (Order = 1)] public int Age; 
} 


If you use Order at all, you must use it throughout. 


The deserializer is not fussy about the order of elements—they can appear in any 
sequence and the type will properly deserialize. 


Subclasses and Child Objects 


Subclassing the root type 
Suppose that your root type has two subclasses, as follows: 


public class Person { public string Name; } 


public class Student : Person { } 
public class Teacher : Person { } 
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and you want to write a reusable method to serialize the root type: 


public void SerializePerson (Person p, string path) 


{ 


XmlSerializer xs = new XmlSerializer (typeof (Person)); 
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using (Stream s = File.Create (path)) 
xs.Serialize (s, p); 


I 
To make this method work with a Student or Teacher, you must inform 
XmlSerializer about the subclasses. There are two ways to do this. The first is to 


register each subclass by applying the XmlInclude attribute: 


[XmlInclude (typeof (Student))] 
[XmlInclude (typeof (Teacher))] 
public class Person { public string Name; } 


The second is to specify each of the subtypes when constructing XmlSerializer: 


XmlSerializer xs = new XmlSerializer (typeof (Person), 
new Type[] { typeof (Student), typeof (Teacher) } ); 


In either case, the serializer responds by recording the subtype in the type attribute: 


<Person xmlns:xsi="http: //www.w3.org/2001/XMLSchema- instance" 
xsi: type="Student"> 
<Name>Stacey</Name> 
</Person> 


This deserializer then knows from this attribute to instantiate a Student and not a 


Person. 
You can control the name that appears in the XML type 
attribute by applying [XmLType] to the subclass: 


[XmlType ("Candidate") ] 
public class Student : Person { } 


Here's the result: 


" 


<Person xmlns:xsi="... 
xsi: type="Candidate"> 


Serializing child objects 
XmlSerializer automatically recurses object references such as the HomeAddress 
field in Person: 


public class Person 


{ 
public string Name; 
public Address HomeAddress = new Address(); 


} 
public class Address { public string Street, PostCode; } 


To demonstrate: 


Person p = new Person { Name = "Stacey" }; 
p.HomeAddress.Street = "Odo St"; 
p.HomeAddress.PostCode = "6020"; 


Here’s the XML to which this serializes: 
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<Person ssa > 
<Name>Stacey</Name> 
<HomeAddress> 
<Street>0do St</Street> 
<PostCode>6020</PostCode> 
</HomeAddress> 
</Person> 


If you have two fields or properties that refer to the same 
object, that object is serialized twice. If you need to preserve 
referential equality, you must use another serialization engine. 


Subclassing child objects 


Suppose that you need to serialize a Person that can reference subclasses of Address, 
as follows: 


public class Address { public string Street, PostCode; } 
public class USAddress : Address { } 
public class AUAddress : Address { } 


public class Person 


{ 
public string Name; 
public Address HomeAddress = new USAddress(); 


} 


There are two distinct ways to proceed, depending on how you want the XML 
structured. If you want the element name always to match the field or property 
name with the subtype recorded in a type attribute: 


<Person ...> 
<HomeAddress xsi: type="USAddress"> 


</HomeAddress> 
</Person> 


you use [XmlInclude] to register each of the subclasses with Address, as follows: 


[XmlInclude (typeof (AUAddress))] 
[XmlInclude (typeof (USAddress))] 
public class Address 


{ 
public string Street, PostCode; 


} 


If, on the other hand, you want the element name to reflect the name of the subtype, 
to the following effect: 


<Person ...> 
<USAddress> 


</USAddress> 
</Person> 
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you instead stack multiple [XmlElement] attributes onto the field or property in the 
parent type: 


public class Person 


{ 


public string Name; 


[XmlElement ("Address", typeof (Address))] 

[XmlElement ("AUAddress", typeof (AUAddress))] 

[XmlElement ("USAddress", typeof (USAddress))] 

public Address HomeAddress = new USAddress(); 
a 


Each XmlElement maps an element name to a type. If you take this approach, you 
don’t require the [XmlInclude] attributes on the Address type (although their pres- 
ence doesn’ break serialization). 


If you omit the element name in [XmlElement] (and specify 
just a type), the type’s default name is used (which is influ- 
enced by [XmlType] but not [XmlRoot]). 


Serializing Collections 


XmlSerializer recognizes and serializes concrete collection types without 
intervention: 


public class Person 
{ 
public string Name; 
public List<Address> Addresses = new List<Address>(); 


} 


public class Address { public string Street, PostCode; } 
Here's the XML to which this serializes: 


<Person ssc > 
<Name>...</Name> 
<Addresses> 
<Address> 
<Street>...</Street> 
<Postcode>...</Postcode> 
</Address> 
<Address> 
<Street>...</Street> 
<Postcode>...</Postcode> 
</Address> 


</Addresses> 
</Person> 


The [XmlArray] attribute lets you rename the outer element (i.e., Addresses). 


The [XmlArrayItem] attribute lets you rename the inner elements (i-e., the Address 
elements). 
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For instance, the following class: 


public class Person 


{ 


public string Name; 


[XmlArray ("PreviousAddresses") ] 
[XmlArrayItem ("Location") ] 
public List<Address> Addresses = new List<Address>(); 


} 


serializes to this: 


<Person ... > 
<Name>...</Name> 
<PreviousAddresses> 
<Location> 
<Street>...</Street> 
<Postcode>...</Postcode> 
</Location> 
<Location> 
<Street>...</Street> 
<Postcode>...</Postcode> 
</Location> 


</PreviousAddresses> 
</Person> 


The XmlArray and XmlArrayItem attributes also allow you to specify XML 
namespaces. 


To serialize collections without the outer element, for example: 


<Person ... > 
<Name>...</Name> 
<Address> 


<Street>...</Street> 
<Postcode>...</Postcode> 

</Address> 

<Address> 
<Street>...</Street> 
<Postcode>...</Postcode> 

</Address> 

</Person> 


instead add [XmlElement] to the collection field or property: 


public class Person 


{ 


[XmlElement ("Address") ] 
public List<Address> Addresses = new List<Address>(); 


} 
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Working with subclassed collection elements 


The rules for subclassing collection elements follow naturally from the other sub- 
classing rules. To encode subclassed elements with the type attribute, for example: 


<Person ... > 
<Name>...</Name> 
<Addresses> 


<Address xsi: type="AUAddress"> 


add [XmlInclude] attributes to the base (Address) type, as we did earlier. This 
works whether or not you suppress serialization of the outer element. 


If you want subclassed elements to be named according to their type, for example: 


<Person ... > 
<Name>...</Name> 
<!-start of optional outer element—> 
<AUAddress> 
<Street>...</Street> 
<Postcode>...</Postcode> 
</AUAddress> 
<USAddress> 
<Street>...</Street> 
<Postcode>...</Postcode> 
</USAddress> 
<!-end of optional outer element-—> 
</Person> 


you must stack multiple [XmlArrayItem] or [XmlElement] attributes onto the col- 
lection field or property. 


Stack multiple [XmlArrayItem] attributes if you want to include the outer collection 
element: 


[XmlArrayItem ("Address", typeof (Address))] 
[XmlArrayItem ("AUAddress", typeof (AUAddress))] 
[XmlArrayItem ("USAddress", typeof (USAddress))] 
public List<Address> Addresses = new List<Address>(); 


Stack multiple [XmlElement] attributes if you want to exclude the outer collection 
element: 


[XmlElement ("Address", typeof (Address) )] 
[XmlElement ("AUAddress", typeof (AUAddress))] 
[XmlElement ("USAddress", typeof (USAddress))] 

public List<Address> Addresses = new List<Address>(); 


IXmISerializable 


Although attribute-based XML serialization is flexible, it has limitations. For 
instance, you cannot add serialization hooks—nor can you serialize nonpublic 
members. It’s also awkward to use if the XML might present the same element or 
attribute in a number of different ways. 
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On that last issue, you can push the boundaries somewhat by passing an 
XmlAttributeOverrides object into XmlSerializer’s constructor. There comes a 
point, however, when it’s easier to take an imperative approach. This is the job of 
IXmlSerializable: 


public interface IXmlSerializable 


{ 
XmlSchema GetSchema(); 


void ReadXml (XmlReader reader); 
void WriteXml (XmlWriter writer); 


} 


Implementing this interface gives you total control over the XML that’s read or 
written. 


A collection class that implements IXmlSerializable 
bypasses XmlSerializer’s rules for serializing collections. This 
can be useful if you need to serialize a collection with a pay- 
load—in other words, additional fields or properties that 
would otherwise be ignored. 


The rules for implementing IXmlSerializable are as follows: 


e ReadXml should read the outer start element, then the content, and then the 
outer end element. 


e WriteXml should write just the content. 


Here’s an example: 


using System; 

using System. Xml; 

using System.XmL. Schema; 

using System.Xml.Serialization; 


public class Address : IXmlSerializable 
{ 


public string Street, PostCode; 
public XmlSchema GetSchema() { return null; } 


public void ReadXml(XmlReader reader) 

{ 
reader .ReadStartElement(); 
Street = reader.ReadElementContentAsString ("Street", ""); 
PostCode = reader.ReadElementContentAsString ("PostCode", ""); 
reader .ReadEndElement(); 


} 


public void WriteXml (XmlWriter writer) 

{ 
writer.WriteElementString ("Street", Street); 
writer.WriteElementString ("PostCode", PostCode); 





The XML Serializer | 737 


n 
0) 
= 
= 
= 
9 
= 
} 
s 





ui 
} 


Serializing and deserializing an instance of Address via XmlSerializer automati- 
cally calls the WriteXml and ReadXml methods. Further, if Person were defined like 
this: 


public class Person 


{ 
public string Name; 
public Address HomeAddress; 


i 
IXmlSerializable would be called upon selectively to serialize the HomeAddress 
field. 


We describe XmlReader and XmlWriter at length in the first section of Chapter 11. 
Also in Chapter 11, in “Patterns for Using XmlReader/XmlWriter” on page 511 we 
provide examples of IXmlSerializable-ready classes. 


The JSON Serializer 


JsonSerializer (in the System.Text.Json namespace) is straightforward to use 
because of the simplicity of the JSON format. The root of a JSON document is either 
an array or an object. Under that root are properties, which can be an object, array, 
string, number, "true", "false", or "null". The JSON serializer directly maps class 
property names to property names in JSON. 


Getting Started 


Assuming Person is defined like this: 


public class Person 


{ 
public string Name { get; set; } 


} 


we can serialize it to a JSON string by calling JsonSerializer .Serialize: 


var p = new Person { Name = "Ian" }; 
string json = JsonSerializer.Serialize (p, 
new JsonSerializerOptions { WriteIndented = true }); 


Here is the result: 


{ 


Name: "Tan" 


} 


The JsonSerializer .Deserialize method does the reverse, and deserializes: 
Person p2 = JsonSerializer.Deserialize<Person> (json); 


The JSON serializer ignores fields, and serializes only properties. 
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The JSON serializer requires that your properties have public 
get and set accessors, which means that it cannot deserialize 
immutable classes or structs whose properties are initialized 
through a constructor. This limitation might be relaxed in 
subsequent releases. 


Serializing Child Objects 
Suppose that we define Person to have a home and work Address: 


public class Address 

{ 
public string Street { get; set; } 
public string PostCode { get; set; } 


} 


public class Person 

{ 
public string Name { get; set; } 
public Address HomeAddress { get; set; } 
public Address WorkAddress { get; set; } 


} 


We can serialize this with no extra work: 


var home = new Address { Street = "1 Main St.", PostCode="11235" }; 
var work = new Address { Street = "4 Elm Ln.", PostCode="31415" }; 
var p = new Person { Name = "Ian", HomeAddress = home, WorkAddress = work }; 


Console.WriteLine (JsonSerializer.Serialize (p, 
new JsonSerializerOptions { WriteIndented = true } )); 


Upon encountering HomeAddress and WorkAddress, the serializer creates JSON 
objects: 


{ 

"Name": "Ian", 

"HomeAddress": { 
"Street": "1 Main St.", 
"PostCode": "11235" 

}, 

"WorkAddress": { 
"Street": "4 Elm Ln.", 
"PostCode": "31415" 

} 

+ 


Note, though, what happens when we set HomeAddress and WorkAddress to the 
same object instance: 


var p = new Person { Name = "Ian", HomeAddress = home, WorkAddress = home }; 
Here’s the output: 
{ 


"Name": "Tan", 
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"HomeAddress": { 
"Street": "1 Main St.", 
"PostCode": "11235" 


}, 

"WorkAddress": { 
"Street": "1 Main St.", 
"PostCode": "11235" 


as 
I 
There is no information in the JSON to indicate that HomeAddress and WorkAddress 
were originally the same object instance. When deserialized, two separate instances 
of Address will be created and assigned to the respective properties. 


This also means that JsonSerializer cannot handle cycles in the object graph. To 
illustrate, suppose that we add a Partner property to our Person class: 


public class Person 


{ 


public Person Partner { get; set; } 


} 


The following throws a JsonException because sara and ian contain a reference to 
each other: 


var sara = new Person { Name = "Sara" }; 

var ian = new Person { Name = "Ian", Partner = sara }; 
sara.Partner = ian; 

string json = JsonSerializer.Serialize (ian); // throws 


Support for cyclic references is planned in .NET Core 5.0. 


Serializing Collections 


JsonSerializer automatically serializes collections. Collections can appear in an 
object’s properties as well as in the root object itself. We can illustrate the latter by 
using the Person and Address classes that we defined at the beginning of the pre- 
ceding section: 


var sara = new Person { Name = "Sara" }; 
var ian = new Person { Name = "Ian" }; 


Console.WriteLine (JsonSerializer.Serialize (new[] { sara, ian }, 
new JsonSerializerOptions { WriteIndented = true })); 


Here’s the result: 


[ 
{ 
"Name": "Sara" 
}, 
{ 
"Name": "Tan" 
} 
] 
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The following deserializes the JSON: 
Person[] people = JsonSerializer.Deserialize<Person[ ]> (json); 
It is possible to serialize a collection containing differently typed objects: 


var sara = new Person { Name = "Sara" }; 
var addr = new Address { Street = "1 Main St.", PostCode = "11235" }; 


Console.WriteLine (JsonSerializer.Serialize (new object[] { sara, addr }, 
new JsonSerializerOptions { WriteIndented = true })); 


This yields the following: 


[ 
{ 
"Name": "Sara" 
}, 
{ 
"Street": "1 Main St.", 
"PostCode": "11235" 
} 
] 


Deserializing such collections is clumsy because the type of each element is not 
written into the JSON. You need to take the low-level approach of deserializing to 
JsonElement[ ] and then enumerating each property: 


var deserialized = JsonSerializer.Deserialize<JsonElement[ ]>( json); 
foreach (var element in deserialized) 


foreach (var prop in element.EnumerateObject()) 
Console.WriteLine ($"{prop.Name}: {prop.Value}"); 
Console.WriteLine ("---"); 


} 


// Output: 
Name: Sara 


Street: 1 Main St. 
PostCode: 11235 


We describe how to use JsonElement in “JsonDocument” on page 519. 


Controlling Serialization with Attributes 


You can control the serialization process with attributes defined in the 
System. Text. Json.Serialization namespace. 


JsonignoreAttribute 


By default, the JSON serializer serializes all properties unless you opt out by apply- 
ing the JsonIgnore attribute: 
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public class Person 


{ 
public string Name { get; set; } 


[JsonIgnore] 
public decimal NetWorth { get; set; } // Not serialized 
} 


JsonPropertyNameAttribute 


If the JSON property name differs from the C# property name, you can create a 
mapping with [JsonPropertyName]. For example, if the JSON property name is 
"FullName", and the C# property name is Name, we could create a mapping, as 
follows: 


public class Person 


{ 
[JsonPropertyName("FullName") ] 
public string Name { get; set; } 
} 


This serializes to the following: 


"FullName":"...", 


} 


JsonExtensionDataAttribute 


Consider a web API that returns instances of a Person class and a client that uses 
the API. Both are maintained by different organizations. If the API author adds a 
new property to the Person class (such as Age), the client is still able to deserialize 
the JSON with its old Person class, because it will simply skip over the unknown 
Age property. However, suppose that the client then updates its instance of Person, 
serializes it, and sends it back to the API. The original Age value is then lost. 


To illustrate, we'll have the web API define Person as: 


public class Person_// v2 


{ 

public int Id { get; set; } 

public string Name { get; set; } 

public int Age { get; set; } // New property 
} 


which would generate JSON like this: 


{ 
"Id": 27182, 
"Name": "Sara", 
"Age": 35 

} 
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If we deserialize that JSON into an older version of the class (without the Age 
property): 


public class Person_// v1 
{ 
public int Id { get; set; } 
public string Name { get; set; } 
} 


the age information has no place to go. 


If we later serialize our version and send it back to the API, our JSON will not con- 
tain an Age property, and the API will interpret Age to be zero (the default value for 
an integer). 


JsonExtensionDataAttribute solves that problem by providing a mechanism to 
store all unrecognized properties so that their values can be used when reserializing. 
When the attribute is placed on a property of type IDictionary<string, TValue> 
(TValue must be object or JsonElement), the serializer uses that property to persist 
the unrecognized JSON properties; no information is lost: 


public class Person 


{ 
public int Id { get; set; } 
public string Name { get; set; } 


[JsonExtensionData ] 
public IDictionary<string, JsonElement> Storage { get; set; } = 
new Dictionary<string, JsonElement>(); 


JsonConverterAttribute 


This attribute is used to specify a type used to convert data to and from JSON. We 
discuss this further in the next section. 


Customizing Data Conversion 


Suppose that you need to interoperate with an API provider that encodes dates with 
the Unix timestamp format (number of seconds since 1/1/1970): 


{ 
"Id":27182, 
"Name": "Sara", 
"Born":464572800 // Number of seconds since 1/1/1970 


} 


We would like to deserialize this into a class that uses the .NET DateTime class: 
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public class Person 
{ 
public int Id { get; set; } 
public string Name { get; set; } 
public DateTime Born { get; set; } 
} 
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We can achieve this by writing a custom data converter: 


public class UnixTimestampConverter : JsonConverter<DateTime> 


{ 
public override DateTime Read (ref Utf8JsonReader reader, Type type, 
JsonSerializerOptions options) 


{ 
if (reader.TryGetInt32(out int timestamp) ) 
return new DateTime (1970, 1, 1).AddSeconds (timestamp); 


throw new Exception ("Expected the timestamp as a number."); 


} 


public override void Write (Utf8JsonWriter writer, DateTime value, 
JsonSerializerOptions options) 
{ 
int timestamp = (int)(value - new DateTime(1970, 1, 1)).TotalSeconds; 
writer.WriteNumberValue(timestamp) ; 
} 
} 


Then we can either apply the [JsonConverter] to the properties that we want to 
convert: 


[JsonConverter (typeof (UnixTimestampConverter ) ) ] 
public DateTime Born { get; set; } 


or, if the API is consistent in its representation of data types, make the converter act 
as a default: 


JsonSerializerOptions opts = new JsonSerializerOptions(); 
opts.Converters.Add (new UnixTimestampConverter()); 
var sara = JsonSerializer.Deserialize<Person> (json, opts); 


The latter instructs the serializer to use UnixTimestampConverter every time it 
encounters a DateTime. 


JSON Serialization Options 


The serializer accepts an optional JsonSerializationOptions parameter, allowing 
additional control over the serialization and deserialization process. The following 
subsections present the most useful options. 


Writelndented 


We have set WriteIndented to true throughout this section to instruct the serializer 
to emit whitespace to generate more human-readable JSON. The default is false, 
which results in everything being crammed onto one line. 


AllowTrailingCommas 


The JSON spec requires properties and array elements to be comma separated but 
does not allow trailing commas: 
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{ 
"Name":"Dylan", 
"LuckyNumbers": [10, 7, ], 
"Age":46, 
i 
The trailing commas after 7 and 46 are not allowed by default. To enable them, do 
this: 


var commaTolerant = JsonSerializer.Deserialize<Person> (brokenJson, 
new JsonSerializerOptions { AllowTrailingCommas = true }); 


ReadCommentHandling 


By default, the deserializer throws an exception when encountering comments 
(because comments are not part of the official JSON standard). Setting ReadComment 
Handling to JsonCommentHandling.Skip instructs the deserializer to skip over 
them instead, so the following can be successfully parsed: 


{ 


"Name":"Dylan" // Comment here 
/* This is another comment */ 


} 
PropertyNameCaselnsensitive 


By default, the deserializer is case sensitive when matching JSON property names to 
C# property names. This means that the following input: 


{ "name":"Dylan" } 


would fail to populate the Name property in our Person class (the JSON property 
would be ignored). 


Setting PropertyNameCaseInsensitive to true solves this problem by instructing 
the deserializer to perform case-insensitive matching (at a small performance cost): 


var dylan = JsonSerializer.Deserialize<Person> (json, 
new JsonSerializerOptions { PropertyNameCaseInsensitive = true }); 


If the input has predictable casing, another solution is to use the JsonPropertyName 
attribute (described earlier) or the PropertyNamingPolicy option (described next). 


PropertyNamingPolicy 


To better support the popular camel-case property naming convention, .NET Core 
3 introduced PropertyNamingPolicy. It provides better performance than the just- 
described PropertyNameCaseInsensitive option and applies to both serialization 
and deserialization. Thus, the code: 


var dylan = new Person { Name = "Dylan" }; 


var json = JsonSerializer.Serialize (dylan, 
new JsonSerializerOptions 
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{ 


PropertyNamingPolicy = JsonNamingPolicy.CamelCase 


})3 
yields: 
"name": "Dylan"} 
which can be deserialized in the same way: 


var dylan2 = JsonSerializer .Deserialize<Person> (json, 
new JsonSerializerOptions 


{ 


PropertyNamingPolicy = JsonNamingPolicy.CamelCase 


}); 


DictionaryKeyPolicy 


With the DictionaryKeyPolicy option, you can force dictionary keys to serialize or 
deserialize with camel casing: 


var dict = new Dictionary<string, string> 


{ 
{ "BookName", "Nutshell" } 
{ "BookVersion", "8.0" }, 


35 


Console.WriteLine (JsonSerializer.Serialize (dict, 
new JsonSerializerOptions 


{ 


WriteIndented = true, 
DictionaryKeyPolicy = JsonNamingPolicy.CamelCase 


}))5 
This outputs the following: 


{ 
"bookName": "Nutshell" 


"bookVersion": "8.0", 
} 
Encoder 


The default text encoder aggressively escapes characters such that the output can 
appear in an HTML document without additional processing: 


string dylan = "<b>Dylan & Friends</b>"; 
Console.WriteLine (JsonSerializer.Serialize (dylan)); 


Here's the output: 
"\u003Cb\uOO3EDylan \u0026 Friends\u003C/b\u003E" 
You can prevent this by changing the Encoder: 


Console.WriteLine (JsonSerializer.Serialize (dylan, 
new JsonSerializerOptions { 





746 | Chapter 17: Serialization 


Encoder = JavaScriptEncoder .UnsafeReLaxedJsonEscaping 


3))3 
This yields the following output: 
"<b>Dylan & Friends</b>" 


UnsafeRelaxedJsonEscaping is a subclass of System.Text.Encodings.Web. 
JavaScriptEncoder. Should the need arise, you can implement your own subclass 
for complete control over the encoding process. 


IgnoreNullValues 
By default, null property values are included in the JSON output, so: 
var person = new Person { Name = null }; 


would serialize to: 


{ 


"Name": null 


} 


With IgnoreNullValues set to true, null-value properties are completely ignored: 


Console.WriteLine (JsonSerializer.Serialize (person), 
new JsonSerializerOptions { IgnoreNullValues = true } )); 


Here's the output: 


{} 


IgnoreReadOnlyProperties 


By default, read-only properties are serialized (but not deserialized, because there is 
no set accessor). You can tell the serializer to ignore read-only properties by setting 
IgnoreReadOnlyProperties to true. 


The Binary Serializer 


The binary serialization engine saves and restores objects with full type and refer- 
ence fidelity, and you can use it to perform such tasks as saving and restoring 
objects to disk. The binary serializer is highly automated and can handle complex 
object graphs with minimum intervention. It’s not available, however, in Windows 
Store apps. 


There are two ways to make a type support binary serialization. The first is 
attribute-based; the second involves implementing ISerializable. Adding 
attributes is simpler; implementing ISerializable is more flexible. You typically 
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implement [Serializable to do the following: 
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e Dynamically control what gets serialized. 


¢ Make your serializable type friendly to being subclassed by other parties. 


Getting Started 
You can make a type serializable by applying a single attribute: 


[Serializable] public sealed class Person 


: public string Name; 
public int Age; 
} 
The [Serializable] attribute instructs the serializer to include all fields in the type. 
This includes both private and public fields (but not properties). Every field must 
itself be serializable; otherwise, an exception is thrown. Primitive NET types such 
as string and int support serialization (as do many other .NET types). 


The Serializable attribute is not inherited, so subclasses are 
not automatically serializable, unless also marked with this 
attribute. 


To serialize an instance of Person, you instantiate BinaryFormatter (in System 
.Runtime.Serialization.Formatters.Binary) and call Serialize. 


.NET Framework also offers a SoapFormatter that you can 
use in the same way to generate SOAP-compatible XML out- 
put. It’s less functional than BinaryFormatter and it neither 
supports generic types nor the filtering of extraneous data 
necessary for version-tolerant serialization. 


The following serializes a Person with a BinaryFormatter: 


Person p = new Person() { Name = "George", Age = 25 }; 
IFormatter formatter = new BinaryFormatter(); 


using (FileStream s = File.Create ("serialized.bin")) 
formatter.Serialize (s, p); 


All of the data necessary to reconstruct the Person object is written to the file 
serialized. bin. The Deserialize method restores the object: 


using (FileStream s = File.OpenRead ("serialized.bin")) 


{ 


Person p2 = (Person) formatter .Deserialize (s); 
Console.WriteLine (p2.Name + " " + p2.Age); // George 25 


} 
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The deserializer bypasses all constructors and field initializers 
when re-creating objects. Behind the scenes, it calls Formatter 
Services.GetUninitializedObject to do this job. You can 
call this method yourself to implement some very grubby 
design patterns! 


The serialized data includes full type and assembly information, so if we try to cast 
the result of deserialization to a matching Person type in a different assembly, an 
error would result. The deserializer fully restores object references to their original 
state upon deserialization. This includes collections, which are just treated as serial- 
izable objects like any other (all collection types in System.Collections.* are 
marked as serializable). 


The binary engine can handle large, complex object graphs 
without special assistance (other than ensuring that all partici- 
pating members are serializable). One thing to be wary of is 
that the serializer’s performance degrades in proportion to the 
number of references in your object graph. This can become 
an issue in a Remoting server that has to process many con- 
current requests. 


Binary Serialization Attributes 


[NonSerialized] 


By default, all fields are serialized. Fields that you don’t want serialized, such as 
those used for temporary calculations or for storing file or window handles, you 
must mark explicitly with the [NonSerialized] attribute: 


[Serializable] public sealed class Person 


{ 
public string Name; 
[NonSerialized] public int Age; 


} 
This instructs the serializer to ignore the Age member. 
Nonserialized members are always empty or null when dese- 


rialized—even if field initializers or constructors set them 
otherwise. 


[OnDeserializing] 


A method marked with the [OnDeserializing] attribute fires just prior to deserial- 
ization and acts as a kind of constructor. This can be important because the binary 
deserializer bypasses all your normal constructors as well as field initializers. 
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In the following example, we define a field called Valid, which we exclude from 
serialization with the [NonSerialized] attribute: 
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public sealed class Person 


{ 
public string Name; 
[NonSerialized] public bool Valid = true; 


public Person() => Valid = true; 
i 
A deserialized Person will never be Valid—despite the constructor and field initial- 
izer both setting Valid to true. We can solve this by writing a special deserialization 
constructor as follows: 


[OnDeserializing ] 
void OnDeserializing (StreamingContext context) => Valid = true; 


[OnDeserialized] 


A method marked with the [OnDeserialized] attribute fires just after deserializa- 
tion. This can be useful for updating calculated fields, and in conjunction with 
[OnSerializing], which we look at next. 


[OnSerializing] and [OnSerialized] 


The [OnSerializing] and [OnSerialized] attributes mark methods for execution 
before or after serialization. 


[OnSerializing] is useful for populating a field that’s used only for serialization. To 
illustrate, suppose that you want to make the following class serializable: 


class Foo 


{ 
public XDocument Xml; 


} 


The difficulty is that XDocument (in the System.Xml.Linq namespace) is not itself 
serializable. We can solve this by applying the [NonSerialized] attribute to the Xml 
field and then defining an [OnSerializing] method that writes the content of the 
XDocument to a string field (that we do serialize): 


[Serializable] 
class Foo 


{ 
[NonSerialized] 
public XDocument Xml; 


string _xmlString; // used only for serialization 


[OnSerializing] 
void OnSerializing (StreamingContext context) 
=> _xmlString = Xml.ToString(); 
} 


The final step is to reconstruct the XDocument when deserializating. We can do this 
by adding an [OnDeserialized] method: 
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[OnDeserialized] 
void OnDeserialized (StreamingContext context) 
=> Xml = XDocument.Parse (_xmlString); 


[OptionalField] and Versioning 


Adding or removing fields doesn’t break compatibility with already serialized data: 
the deserializer skips over data for which there's no matching field. When adding a 
field, you can apply the following attribute to remind yourself that it might be 
absent from data serialized by an older version of the software: 


[Serializable] public sealed class Person 


{ 
public string Name; 
[OptionalField (VersionAdded = 2)] public DateTime DateOfBirth; 


} 


This serves as documentation and has no effect on serialization semantics. 


If versioning robustness is important, avoid renaming fields 
and avoid retrospectively adding the NonSerialized attribute. 
Never change a field’s type. 


Binary Serialization with ISerializable 


Implementing ISerializable gives a type complete control over its binary seriali- 
zation and deserialization. 


Here’s the ISerializable interface definition: 


public interface ISerializable 


{ 


void GetObjectData (SerializationInfo info, StreamingContext context); 


} 


GetObjectData fires upon serialization; its job is to populate the Serialization 
Info object (a name-value dictionary) with data from all fields that you want serial- 
ized. Here’s how we would write a GetObjectData method that serializes two fields, 
called Name and DateOfBirth: 


public virtual void GetObjectData (SerializationInfo info, 
StreamingContext context) 


{ 
info.AddValue ("Name", Name); 
info.AddValue ("DateOfBirth", DateOfBirth) ; 


} 


In this example, we've chosen to name each item according to its corresponding 
field. This is not required; you can use any name, but you must use the same name 
upon deserialization. The values themselves can be of any serializable type; the seri- 
alization will continue recursively as necessary. It’s legal to store null values in the 
dictionary. 
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It's a good idea to make the GetObjectData method virtual 
—unless your class is sealed. This allows subclasses to extend 
serialization without having to reimplement the interface. 


SerializationInfo also contains properties that you can use to control the type 
and assembly into which the instance should deserialize. 


In addition to implementing ISerializable, a type controlling its own serialization 
needs to provide a deserialization constructor that takes the same two parameters as 
GetObjectData. The constructor can be declared with any accessibility and the run- 
time will still find it. Typically, though, you would declare it protected so that sub- 
classes can call it. 


In the following example, we define Player and Team classes, following the princi- 
ples of immutability (with everything read-only). But because the immutable collec- 
tions are not serializable, we need to take control over the serialization process by 
implementing ISerializable: 


[Serializable] public class Player 
{ 
public readonly string Name; 
public Player (string name) => Name = name; 


} 


[Serializable] public class Team : ISerializable 
{ 
public readonly string Name; 
public readonly ImmutableList<Player> Players; // Not serializable! 


public Team (string name, params Player[] players) 
{ 

Name = name; 

Players = players.ToImmutableList(); 


} 


// Serialize the object: 
public virtual void GetObjectData (SerializationInfo si, 
StreamingContext sc) 
{ 
si.AddValue ("Name", Name); 
// Convert Players to an ordinary serializable array: 
si.AddValue ("PlayerData", Players.ToArray()); 


} 


// Deserialize the object: 
protected Team (SerializationInfo si, StreamingContext sc) 


{ 


Name = si.GetString ("Name"); 


// Deserialize Players to an array to match our serialization: 
Player[] p = (Player[]) si.GetValue ("PlayerData", typeof (Player[])); 


// Construct a new immutable List using this array: 
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Players = p.ToImmutableList(); 
} 
} 
(You could also solve this problem by using the [OnSerializing] and [On 
Deserialized] attributes that we discussed earlier.) 


For commonly used types, the SerializationInfo class has typed “Get” methods, 
such as GetString, in order to make writing deserialization constructors easier. If 
you specify a name for which no data exists, an exception is thrown. This happens 
most often when there’s a version mismatch between the code doing the serializa- 
tion and deserialization. You've added an extra field, for instance, and then forgot- 
ten about the implications of deserializing an old instance. To work around this 
problem, you can do either of the following: 


e Add exception handling around code that retrieves a data member added in a 
later version 


¢ Implement your own version numbering system; for example: 


public string MyNewField; 


public virtual void GetObjectData (SerializationInfo si, 
StreamingContext sc) 


{ 


si.AddValue ("_version", 2); 
si.AddValue ("MyNewField", MyNewField); 


= 


protected Team (SerializationInfo si, StreamingContext sc) 


{ 


int version = si.GetInt32 ("_version"); 
if (version >= 2) MyNewField = si.GetString ("MyNewField"); 


Subclassing Serializable Classes 


In the preceding examples, we sealed the classes that relied on attributes for seriali- 
zation. To see why, consider the following class hierarchy: 


[Serializable] public class Person 
{ 

public string Name; 

public int Age; 


} 
[Serializable] public sealed class Student : Person 
{ 
public string Course; 
} 
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In this example, both Person and Student are serializable, and both classes use the 
default runtime serialization behavior because neither class implements 
ISerializable. 


Now imagine that the developer of Person decides for some reason to implement 
ISerializable and provide a deserialization constructor to control Person seriali- 
zation. The new version of Person might look like this: 


[Serializable] public class Person : ISerializable 


{ 


public string Name; 
public int Age; 


public virtual void GetObjectData (SerializationInfo si, 
StreamingContext sc) 


{ 
si.AddValue ("Name", Name); 
si.AddValue ("Age", Age); 

} 


protected Person (SerializationInfo si, StreamingContext sc) 


{ 
Name = si.GetString ("Name"); 
Age = si.GetInt32 ("Age"); 

} 


public Person() {} 
} 


Although this works for instances of Person, this change breaks serialization of 
Student instances. Serializing a Student instance would appear to succeed, but the 
Course field in the Student type isn’t saved to the stream because the implementa- 
tion of ISerializable.GetObjectData on Person has no knowledge of the mem- 
bers of the Student-derived type. Additionally, deserialization of Student instances 
throws an exception because the runtime is looking (unsuccessfully) for a deseriali- 
zation constructor on Student. 


The solution to this problem is to implement ISerializable from the outset for 
serializable classes that are public and nonsealed. (With internal classes, it’s not so 
important because you can easily modify the subclasses later if required.) 


If we started out by writing Person, as in the preceding example, Student would 
then be written as follows: 


[Serializable] 
public class Student : Person 


{ 


public string Course; 


public override void GetObjectData (SerializationInfo si, 
StreamingContext sc) 


{ 
base.GetObjectData (si, sc); 
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si.AddValue ("Course", Course); 


} 


protected Student (SerializationInfo si, StreamingContext sc) 
: base (si, sc) 


{ 


Course = si.GetString ("Course"); 


} 


public Student() {} 
} 
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18 


Assemblies 








An assembly is the basic unit of deployment in .NET Core and is also the container 
for all types. An assembly contains compiled types with their IL code, runtime 
resources, and information to assist with versioning and referencing other assem- 
blies. An assembly also defines a boundary for type resolution. In .NET Core, an 
assembly comprises a single file with a .d/l extension. 


When you build an executable application in .NET Core, you 
end up with two files: an assembly (.d/l) and an executable 
launcher (.exe) appropriate to the platform you're targeting. 


This differs to what happens in .NET Framework, which gen- 
erates a portable executable (PE) assembly. A PE has an .exe 
extension and acts both as an assembly and an application 
launcher. A PE can simultaneously target 32- and 64-bit ver- 
sions of Windows. 


.NET Core also lets you reference WinRT libraries, which have a .winmd extension. 
Structurally, they are similar to assemblies, but contain only metadata and no IL 
code. 


Most of the types in this chapter come from the following namespaces: 


System.Reflection 
System.Resources 
System.Globalization 


What's in an Assembly 


An assembly contains four kinds of things: 


An assembly manifest 
Provides information to the CLR, such as the assembly’s name, version, and 
other assemblies that it references 
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An application manifest 
Provides information to the operating system, such as how the assembly should 
be deployed and whether administrative elevation is required 


Compiled types 
The compiled IL code and metadata of the types defined within the assembly 


Resources 
Other data embedded within the assembly, such as images and localizable text 


Of these, only the assembly manifest is mandatory, although an assembly nearly 
always contains compiled types (unless it’s a resource assembly. See “Resources and 
Satellite Assemblies” on page 768). 


The Assembly Manifest 


The assembly manifest serves two purposes: 


e It describes the assembly to the managed hosting environment. 

e It acts as a directory to the modules, types, and resources in the assembly. 
Assemblies are thus self-describing. A consumer can discover all of an assembly’s 
data, types, and functions—without needing additional files. 


An assembly manifest is not something you add explicitly to 
an assembly—it’s automatically embedded into an assembly as 
part of compilation. 


Here's a summary of the functionally significant data stored in the manifest: 


¢ The simple name of the assembly 

e A version number (AssemblyVersion) 

e A public key and signed hash of the assembly, if strongly named 

e A list of referenced assemblies, including their version and public key 
e A list of types defined in the assembly 

¢ The culture it targets, if a satellite assembly (AssemblyCuLture) 


The manifest can also store the following informational data: 


e A full title and description (AssemblyTitle and AssemblyDescription) 


e Company and copyright information (AssemblyCompany and Assembly 
Copyright) 


e A display version (AssemblyInformationalVersion) 


e Additional attributes for custom data 
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Some of this data is derived from arguments given to the compiler, such as the list of 
referenced assemblies or the public key with which to sign the assembly. The rest 
comes from assembly attributes, indicated in parentheses. 


You can view the contents of an assembly’s manifest with 
the .NET tool ildasm.exe. In Chapter 19, we describe how to 
use reflection to do the same programmatically. 


Specifying assembly attributes 


Commonly used assembly attributes can be specified in Visual Studio on the proj- 
ect’s Properties page, on the Package tab. The settings on that tab are added to the 
project file (.csproj). 


To specify attributes not supported by the Package tab, or if not working with 
a .csproj file, you can specify assembly attributes in source code. .NET Framework 
projects automatically create a file for this purpose, AssemblyInfo.cs in the Properties 
folder, but .NET Core projects do not. Although you can specify attributes in any 
source code file in your project, adding a .cs file specifically for attributes allows you 
to keep them together and well organized. 


A dedicated attributes file contains only using statements and assembly attribute 
declarations. For example, to expose internally-scoped types to a unit test project, 
you would do this: 


using System.Runtime.CompilerServices; 


[assembly: InternalsVisibleTo("MyUnitTestProject" ) ] 


The Application Manifest (Windows) 


An application manifest is an XML file that communicates information about the 
assembly to the OS. An application manifest is embedded into the startup exe- 
cutable as a Win32 resource during the build process. If present, the manifest is read 
and processed before the CLR loads the assembly—and can influence how Win- 
dows launches the application’s process. 


A .NET application manifest has a root element called assembly in the XML name- 
space urn: schemas -microsoft-com:asm. v1: 


<?xml version="1.0" encoding="utf-8"?> 

<assembly manifestVerston="1.0" xmlns="urn:schemas-microsoft-com:asm.v1"> 
<!-- contents of manifest --> 

</assembly> 


The following manifest instructs the OS to request administrative elevation: 


<?xml version="1.0" encoding="utf-8"?> 
<assembly manifestVersion="1.0" xmlns="urn:schemas-microsoft-com:asm.v1"> 
<trustInfo xmlns="urn:schemas-microsoft-com:asm.v2"> 
<security> 
<requestedPrivileges> 
<requestedExecutionLevel level="requireAdministrator" /> 
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</requestedPrivileges> 
</security> 
</trustInfo> 
</assembly> 


We describe the consequences of requesting administrative elevation in Chapter 21. 


UWP applications have a far more elaborate manifest, described in the Pack- 
age.appxmanifest file. This includes a declaration of the program's capabilities, which 
determine permissions granted by the OS. The easiest way to edit this file is with 
Visual Studio, which displays a dialog when you double-click the manifest file. 


Deploying an application manifest 


You can add an application manifest to a .NET Core project in Visual Studio by 
right-clicking your project in Solution Explorer, selecting Add, then “New item, 
and then choosing Application Manifest File. Upon building, the manifest will be 
embedded into the output assembly. 


The .NET tool ildasm.exe is blind to the presence of an 
embedded application manifest. Visual Studio, however, indi- 
cates whether an embedded application manifest is present if 
you double-click the assembly in Solution Explorer. 


Modules 


The contents of an assembly are actually packaged within an intermediate container, 
called a module. A module corresponds to a file containing the contents of an 
assembly. The reason for this extra layer of containership is to allow an assembly to 
span multiple files, a feature present in .NET Framework but absent in .NET Core. 
Figure 18-1 illustrates the relationship. 





Assembly 


Manifest IL code + R 
oli type metadata spas Module 


(mandatory) (optional) 


(optional) 


MyApp.exe 














Figure 18-1. Single-file assembly 


Although .NET Core does not support multifile assemblies, at times you need to be 
aware of the extra level of containership that modules impose. The main scenario is 
with reflection (see “Reflecting Assemblies” on page 817 and “Emitting Assemblies 
and Types” on page 830 in Chapter 19). 
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The Assembly Class 


The Assembly class in System.Reflection is a gateway to accessing assembly meta- 
data at runtime. There are a number of ways to obtain an assembly object: the sim- 
plest is via a Type’s Assembly property: 


Assembly a = typeof (Program).Assembly; 


You can also obtain an Assembly object by calling one of Assemb1y’s static methods: 


GetExecutingAssembly 


Returns the assembly of the type that defines the currently executing function 


GetCallingAssembly 


Does the same as GetExecutingAssembly but for the function that called the 
currently executing function 


GetEntryAssembly 


Returns the assembly defining the application's original entry method 


After you have an Assembly object, you can use its properties and methods to query 
the assembly’s metadata and reflect upon its types. Table 18-1 shows a summary of 


these functions. 


Table 18-1. Assembly members 


Functions 


FullName, GetName 


CodeBase, Location 


Load, LoadFrom, LoadFile 


GetSatelliteAssembly 


GetType, GetTypes 


EntryPoint 


GetModule, GetModules, 
ManifestModule 


GetCustomAttribute, 
GetCustomAttributes 


Purpose 


See the section... 


Returns the fully qualified name or “Assembly Names” on page 763 


an AssembLyName object 


Location of the assembly file 


Manually loads an assembly into 
memory 


Locates the satellite assembly of a 
given culture 


Returns a type, or all types, 
defined in the assembly 


Returns the application’s entry 
method, as aMethodInfo 


Returns all modules, or the main 
module, of an assembly 


Returns the assembly's attributes 


“Loading, Resolving, and Isolating 
Assemblies” on page 775 


“Loading, Resolving, and Isolating 
Assemblies” on page 775 


“Resources and Satellite 
Assemblies” on page 768 


“Reflecting and Activating Types” 
on page 798 in Chapter 19 


“Reflecting and Invoking 
Members” on page 805 in 
Chapter 19 


“Reflecting Assemblies” on page 
817 in Chapter 19 


“Working with Attributes” on 
page 818 in Chapter 19 
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Strong Names and Assembly Signing 


A strongly named assembly has a unique identity. It works by adding two bits of 
metadata to the manifest: 


e A unique number that belongs to the authors of the assembly 


e A signed hash of the assembly, proving that the unique number holder pro- 
duced the assembly 


This requires a public/private key pair. The public key provides the unique identify- 
ing number, and the private key facilitates signing. 


The public key is valuable in guaranteeing the uniqueness of assembly references: a 
strongly named assembly incorporates the public key into its identity. 


Strongly naming an assembly is important in .NET Frame- 
work for two reasons: 


e It allows the assembly to be loaded into the “global 
assembly cache” 


« It allows the assembly to by referenced by other strongly 
named assemblies. 


Strong naming is much less important in .NET Core, 
because .NET Core does not have a global assembly cache; nor 
does it impose the second restriction. 


In .NET Framework, the private key protects your assembly from tampering, in that 
without your private key, no one can release a modified version of the assembly 
without the signature breaking. In practice, this is of use when loading an assembly 
into .NET Framework’s global assembly cache. In .NET Core, the signature is of lit- 
tle use because it’s never checked. 


Adding a strong name to a previously “weak” named assembly changes its identity. 
For this reason, it pays to strong-name an assembly from the outset, if you think the 
assembly might need a strong name in the future. 


Strong-name-signing is not the same as Authenticode-signing. 
We cover Authenticode later in this chapter. 


How to Strongly Name an Assembly 


To give an assembly a strong name, first generate a public/private key pair with the 
sn.exe utility: 


sn.exe -k MyKeyPair.snk 
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Visual Studio installs a shortcut called Developer Command 
Prompt for VS, which starts a command prompt whose PATH 
contains development tools such as sn.exe. 


This manufactures a new key pair and stores it to a file called MyKeyPair.snk. If you 
subsequently lose this file, you will permanently lose the ability to recompile your 
assembly with the same identity. 


You can sign an assembly with this file by updating your project file. From Visual 
Studio, go to the Project Properties window, and then, on the Signing tab, select the 
“Sign the assembly” checkbox and select your .snk file. 


The same key pair can sign multiple assemblies—they’Il still have distinct identities 
if their simple names differ. 


Assembly Names 


An assembly’s “identity” comprises four pieces of metadata from its manifest: 


¢ Its simple name 

e Its version (“0.0.0.0” if not present) 

¢ Its culture (“neutral” if not a satellite) 

e Its public key token (“null” if not strongly named) 
The simple name comes not from any attribute, but from the name of the file to 
which it was originally compiled (less any extension). So, the simple name of the 


System.Xml.dll assembly is “System.Xml” Renaming a file doesn’t change the assem- 
bly’s simple name. 


The version number comes from the AssemblyVersion attribute. It’s a string divided 
into four parts as follows: 


major.minor.build. revision 
You can specify a version number as follows: 
[assembly: AssemblyVersion ("2.5.6.7")] 


The culture comes from the AssemblyCulture attribute and applies to satellite 
assemblies, described later in the section “Resources and Satellite Assemblies” on 
page 768. 


The public key token comes from the strong name supplied at compile time, as we 
discussed in the preceding section. 
Fully Qualified Names 


A fully qualified assembly name is a string that includes all four identifying compo- 
nents, in this format: 


simple-name, Version=version, Culture=culture, PublickeyToken=public-key 
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For example, the fully qualified name of System.Private.CoreLib.dll is System.Pri- 
vate.CoreLib, Version=4.0.0.0, Culture=neutral, PublicKey Token=7cec85d7bea7798e. 


If the assembly has no AssemblyVersion attribute, the version appears as 0.0.0.0. 
If it is unsigned, its public key token appears as null. 


An Assembly object's FullName property returns its fully qualified name. The com- 
piler always uses fully qualified names when recording assembly references in the 
manifest. 


A fully qualified assembly name does not include a directory 
path to assist in locating it on disk. Locating an assembly 
residing in another directory is an entirely separate matter 
that we pick up in “Loading, Resolving, and Isolating Assem- 
blies” on page 775. 


The AssemblyName Class 
AssemblyName is a class with a typed property for each of the four components of a 
fully qualified assembly name. AssemblyName has two purposes: 

e It parses or builds a fully qualified assembly name. 


e It stores some extra data to assist in resolving (finding) the assembly. 
You can obtain an AssemblyName object in any of the following ways: 


¢ Instantiate an AssemblyName, providing a fully qualified name 

e Call GetName on an existing Assembly 

¢ Call AssemblyName.GetAssemblyName, providing the path to an assembly file 
on disk 


You can also instantiate an AssemblyName object without any arguments and then 
set each of its properties to build a fully qualified name. An AssemblyName is muta- 
ble when constructed in this manner. 


Here are its essential properties and methods: 


string Ful lLName { get; } // Fully qualified name 
string Name { get; set; } // Simple name 

Version Version { get; set; } // Assembly version 
CultureInfo CultureInfo { get; set; } // For satellite assemblies 
string CodeBase { get; set; } // Location 

byte[] GetPublicKey(); // 160 bytes 

void SetPublicKey (byte[] key); 

byte[ ] GetPublicKeyToken(); // 8-byte version 

void SetPublicKeyToken (byte[] publicKeyToken) ; 
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Version is itself a strongly typed representation, with properties for Major, Minor, 
Build, and Revision numbers. GetPublicKey returns the full cryptographic public 
key; GetPublicKeyToken returns the last eight bytes used in establishing identity. 


To use AssembLyName to obtain the simple name of an assembly: 


Console.WriteLine (typeof (string).Assembly.GetName().Name); 
// System.Private.CoreLib 


To get an assembly version: 
string v = myAssembly.GetName().Version.ToString(); 


We examine the CodeBase property in “Loading, Resolving, and Isolating Assem- 
blies” on page 775. 


Assembly Informational and File Versions 


Two further assembly attributes are available for expressing version-related infor- 
mation. Unlike AssemblyVersion, the following two attributes do not affect an 
assembly’s identity and so have no effect on what happens at compile-time or at 
runtime: 


AssemblyInformationalVersion 
The version as displayed to the end user. This is visible in the Windows File 
Properties dialog box as Product Version. Any string can go here, such as “5.1 
Beta 2.” Typically, all of the assemblies in an application would be assigned the 
same informational version number. 


AssemblyFileVersion 
This is intended to refer to the build number for that assembly. This is visible in 
the Windows File Properties dialog box as File Version. As with Assembly 
Version, it must contain a string consisting of up to four numbers separated by 
periods. 


Authenticode Signing 


Authenticode is a code-signing system whose purpose is to prove the identity of the 
publisher. Authenticode and strong-name signing are independent: you can sign an 
assembly with either or both systems. 


Although strong-name signing can prove that assemblies A, B, and C came from the 
same party (assuming the private key hasn't been leaked), it can’t tell you who that 
party was. To know that the party was Joe Albahari—or Microsoft Corporation— 
you need Authenticode. 


Authenticode is useful when downloading programs from the internet, because it 
provides assurance that a program came from whoever was named by the Certifi- 
cate Authority and was not modified in transit. It also prevents the Unknown Pub- 
lisher warning when running a downloaded application for the first time. 
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Authenticode signing is also a requirement when submitting apps to the Windows 
Store. 


Authenticode works with not only .NET assemblies, but also unmanaged executa- 
bles and binaries such as .msi deployment files. Of course, Authenticode doesn't 
guarantee that a program is free from malware—although it does make it less likely. 
A person or entity has been willing to put its name (backed by a passport or com- 
pany document) behind the executable or library. 


The CLR does not treat an Authenticode signature as part of 
an assembly’s identity. However, it can read and validate 
Authenticode signatures on demand, as you'll see soon. 


Signing with Authenticode requires that you contact a Certificate Authority (CA) 
with evidence of your personal identity or company’s identity (articles of incorpora- 
tion, etc.). After the CA has checked your documents, it will issue an X.509 code- 
signing certificate that is typically valid for one to five years. This enables you to 
sign assemblies with the signtool utility. You can also make a certificate yourself with 
the makecert utility; however, it will be recognized only on computers on which the 
certificate is explicitly installed. 


The fact that (non-self-signed) certificates can work on any computer relies on pub- 
lic key infrastructure. Essentially, your certificate is signed with another certificate 
belonging to a CA. The CA is trusted because all CAs are loaded into the OS (to see 
them, go to the Windows Control Panel and then, in the search box, type 
“certificate”. In the Administrative Tools section, click “Manage computer certifi- 
cates.” This launches the Certificate Manager. Open the node Trusted Root Certifi- 
cation Authorities and click Certificates). A CA can revoke a publisher’s certificate if 
leaked, so verifying an Authenticode signature requires periodically asking the CA 
for an up-to-date list of certification revocations. 


Because Authenticode uses cryptographic signing, an Authenticode signature is 
invalid if someone subsequently tampers with the file. We discuss cryptography, 
hashing, and signing in Chapter 21. 


How to Sign with Authenticode 


Obtaining and installing a certificate 


The first step is to obtain a code-signing certificate from a CA (see the sidebar that 
follows). You can then either work with the certificate as a password-protected file, 
or load the certificate into the computer’s certificate store. The benefit of doing the 
latter is that you can sign without needing to specify a password. This is 
advantageous because it avoids having a password visible in automated build scripts 
or batch files. 
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Where to Get a Code-Signing Certificate 


Just a handful of code-signing CAs are preloaded into Windows as root certification 
authorities. These include Comodo, Go Daddy, GlobalSign, DigiCert, thawte, and 
Symantic. 


There are also resellers such as Ksoftware that offer discounted code-signing certifi- 
cates from the aforementioned authorities. 


The Authenticode certificates issued by Ksoftware, Comodo, Go Daddy, and Global- 
Sign are advertised as less restrictive in that they will also sign non-Microsoft pro- 
grams. Aside from this, the products from all vendors are functionally equivalent. 


Note that a certificate for SSL cannot generally be used for Authenticode signing 
(despite using the same X.509 infrastructure). This is, in part, because a certificate 
for SSL is about proving ownership of a domain; Authenticode is about proving 
who you are. 











To load a certificate into the computer's certificate store, open the Certificate Man- 
ager as described earlier. Open the Personal folder, right-click its Certificates folder, 
and then pick All Tasks/Import. An import wizard guides you through the process. 
After the import is complete, click the View button on the certificate, go to the 
Details tab, and copy the certificate’s thumbprint. This is the SHA-256 hash that 
you'll subsequently need to identity the certificate when signing. 


If you also want to strong-name-sign your assembly, you must 
do so before Authenticode signing. This is because the CLR 
knows about Authenticode signing, but not vice versa. So, if 
you strong-name-sign an assembly after Authenticode-signing 
it, the latter will see the addition of the CLR’s strong name as 
an unauthorized modification, and consider the assembly 
tampered. 


Signing with signtool.exe 


You can Authenticode-sign your programs with the signtool utility that comes with 
Visual Studio (look in the Microsoft SDKs\ClickOnce\SignTool folder under Program 
Files). The following signs a file called LINQPad.exe with the certificate located in 
the computer’s My Store called “Joseph Albahari,’ using the secure SHA256 hashing 
algorithm: 


signtool sign /n "Joseph Albahari" /fd sha256 LINQPad.exe 
You can also specify a description and product URL with /d and /du: 
... /d LINQPad /du http://www. lingpad.net 


In most cases, you will also want to specify a time-stamping server. 
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Time stamping 


After your certificate expires, you'll no longer be able to sign programs. However, 
programs that you signed before its expiry will still be valid—if you specified a time- 
stamping server with the /tr switch when signing. The CA will provide you with a 
URI for this purpose: the following is for Comodo (or Ksoftware): 


... /tr http://timestamp.comodoca.com/authenticode /td SHA256 


Verifying that a program has been signed 


The easiest way to view an Authenticode signature on a file is to view the file's prop- 
erties in Windows Explorer (look in the Digital Signatures tab). The signtool utility 
also provides an option for this. 


Resources and Satellite Assemblies 


An application typically contains not only executable code, but also content such as 
text, images, or XML files. Such content can be represented in an assembly through 
a resource. There are two overlapping use cases for resources: 

¢ Incorporating data that cannot go into source code, such as images 

¢ Storing data that might need translation in a multilingual application 
An assembly resource is ultimately a byte stream with a name. You can think of an 
assembly as containing a dictionary of byte arrays keyed by string. You can see this 


in ildasm if you disassemble an assembly that contains a resource called banner.jpg 
and a resource called data.xml: 


-mresource public banner. jpg 


// Offset: Ox00000F58 Length: 0x000004F6 


} 
-mresource public data.xml 
{ 
// Offset: 0x00001458 Length: 0x0000027E 
} 


In this case, banner.jpg and data.xml were included directly in the assembly—each 
as its own embedded resource. This is the simplest way to work. 


The Framework also lets you add content through intermediate .resources contain- 
ers. There are designed for holding content that might require translation into dif- 
ferent languages. Localized .resources can be packaged as individual satellite 
assemblies that are automatically picked up at runtime, based on the user’s OS 
language. 


Figure 18-2 illustrates an assembly that contains two directly embedded resources, 
plus a .resources container called welcome.resources, for which we've created two 
localized satellites. 





768 | Chapter 18: Assemblies 





Assembly 


IL code+ 
type metadata 
(0) 9) d(0) 1p) 


Resources 
(optional) 


Manifest 
(mandatory) 


MyApp.exe 


IL code + 
type metadata acct Additional 
(optional) P module 


ExtraStuff.netmodule 














Figure 18-2. Resources 


Directly Embedding Resources 


Embedding resources into assemblies is not supported in 
Window Store apps. Instead, add any extra files to your 
deployment package, and access them by reading from your 
application StorageFolder (Package.Current.Installed 
Location). 


To directly embed a resource using Visual Studio: 


e Add the file to your project. 


e Set its build action to Embedded Resource. 


Visual Studio always prefixes resource names with the project's default namespace, 
plus the names of any subfolders in which the file is contained. So, if your project’s 
default namespace was Westwind.Reports and your file was called banner.jpg in the 
folder pictures, the resource name would be Westwind.Reports. pictures. banner.jpg. 


Resource names are case sensitive. This makes project sub- 
folder names in Visual Studio that contain resources effec- 
tively case sensitive. 


To retrieve a resource, you call GetManifestResourceStream on the assembly con- 
taining the resource. This returns a stream, which you can then read as any other: 


Assembly a = Assembly.GetEntryAssembly(); 
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using (Stream s = a.GetManifestResourceStream ("TestProject.data.xml")) 
using (XmlReader r = XmlReader.Create (s)) 


System.Drawing.Image image; 
using (Stream s = a.GetManifestResourceStream ("TestProject.banner.jpg")) 
image = System.Drawing.Image.FromStream (s); 


The stream returned is seekable, so you can also do this: 


byte[] data; 
using (Stream s = a.GetManifestResourceStream ("TestProject.banner.jpg")) 
data = new BinaryReader (s).ReadBytes ((int) s.Length); 


If you've used Visual Studio to embed the resource, you must remember to include 
the namespace-based prefix. To help avoid error, you can specify the prefix in a sep- 
arate argument, using a type. The type’s namespace is used as the prefix: 


using (Stream s = a.GetManifestResourceStream (typeof (X), "data.xml")) 


X can be any type with the desired namespace of your resource (typically, a type in 
the same project folder). 


Setting a project item’s build action in Visual Studio to 
Resource within a WPF application is not the same as setting 
its build action to Embedded Resource. The former actually 
adds the item to a .resources file called <Assembly- 
Name>.g.resources, whose content you access through WPF’s 
Application class, using a URI as a key. 


To add to the confusion, WPF further overloads the term 


resource. Static resources and dynamic resources are both unre- 
lated to assembly resources! 


GetManifestResourceNames returns the names of all resources in the assembly. 


.resources Files 


.resources files are containers for potentially localizable content. A .resources file 
ends up as an embedded resource within an assembly—just like any other kind of 
file. The difference is that you must do the following: 


¢ Package your content into the .resources file to begin with 


e Access its content through a ResourceManager or pack URI rather than a Get 
ManifestResourceStream 


.resources files are structured in binary and so are not human-editable; therefore, 
you must rely on tools provided by the Framework and Visual Studio to work with 
them. The standard approach with strings or simple data types is to use the .resx 
format, which can be converted to a .resources file either by Visual Studio or the 
resgen tool. The .resx format is also suitable for images intended for a Windows 
Forms or ASP.NET application. 
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In a WPF application, you must use Visual Studio's “Resource” build action for 
images or similar content needing to be referenced by URI. This applies whether 
localization is needed or not. 


We describe how to do each of these in the following sections. 


.resx Files 


A .resx file is a design-time format for producing .resources files. A .resx file uses 
XML and is structured with name/value pairs as follows: 


<root> 
<data name="Greeting"> 
<value>hello</value> 
</data> 
<data name="DefaultFontSize" type="System.Int32, mscorlib"> 
<value>10</value> 
</data> 
</root> 


To create a .resx file in Visual Studio, add a project item of type Resources File. The 
rest of the work is done automatically: 


e The correct header is created. 


¢ A designer is provided for adding strings, images, files, and other kinds of data. 


¢ The .resx file is automatically converted to the .resources format and embedded 
into the assembly upon compilation. 


¢ A class is written to help you access the data later on. 


The resource designer adds images as typed Image objects 
(System.Drawing.dll) rather than as byte arrays, making them 
unsuitable for WPF applications. 


Reading .resources files 
The ResourceManager class reads .resources files embedded within an assembly: 


ResourceManager r = new ResourceManager ("welcome", 
Assembly.GetExecutingAssembLy()); 


(The first argument must be namespace-prefixed if the resource was compiled in 
Visual Studio.) 


If you create a .resx file in Visual Studio, a class of the same 
name is generated automatically with properties to retrieve 
each of its items. 


You can then access what's inside by calling GetString or GetObject with a cast: 
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string greeting = r.GetString ("Greeting"); 
int fontSize = (int) r.GetObject ("DefaultFontSize"); 
Image image = (Image) r.GetObject ("flag.png"); 


To enumerate the contents of a .resources file: 


ResourceManager r = new ResourceManager (...); 
ResourceSet set = r.GetResourceSet (CultureInfo.CurrentUICuLture, 
true, true); 
foreach (System.Collections.DictionaryEntry entry in set) 
Console.WriteLine (entry.Key); 


Creating a pack URI resource in Visual Studio 


In a WPF application, XAML files need to be able to access resources by URI; for 
instance: 


<Button> 
<Image Height="50" Source="flag.png"/> 
</Button> 


Or, if the resource is in another assembly: 


<Button> 
<Image Height="50" Source="UtilsAssembly ; Component/flag.png"/> 
</Button> 


(Component is a literal keyword.) 


To create resources that can be loaded in this manner, you cannot use .resx files. 
Instead, you must add the files to your project and set their build action to Resource 
(not Embedded Resource). Visual Studio then compiles them into a .resources file 
called <AssemblyName>.g.resources—also the home of compiled XAML (.baml) 
files. 


To load a URI-keyed resource programmatically, call Application.GetResource- 
Stream: 


Uri u = new Uri ("flag.png", Urikind.Relative); 
using (Stream s = Application.GetResourceStream (u).Stream) 


Notice we used a relative URI. You can also use an absolute URI in exactly the fol- 
lowing format (the three commas are not a typo): 

Uri u = new Uri ("pack://application:,,,/flag.png"); 
If youd rather specify an Assembly object, you can retrieve content instead with a 


ResourceManager: 


Assembly a = Assembly.GetExecutingAssembly(); 
ResourceManager r = new ResourceManager (a.GetName().Name + 
using (Stream s = r.GetStream ("flag.png")) 


POs a); 


A ResourceManager also lets you enumerate the content of a .g.resources container 
within a given assembly. 
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Satellite Assemblies 
Data embedded in .resources is localizable. 


Resource localization is relevant when your application runs on a version of Win- 
dows built to display everything in a different language. For consistency, your appli- 
cation should use that same language, too. 


A typical setup is as follows: 


e The main assembly contains .resources for the default or fallback language. 


e Separate satellite assemblies contain localized .resources translated to different 
languages. 


When your application runs, .NET Core examines the language of the current OS 
(from CultureInfo.CurrentUICulture). Whenever you request a resource using 
ResourceManager, the Framework looks for a localized satellite assembly. If one’s 
available—and it contains the resource key you requested—it’s used in place of the 
main assembly's version. 


This means that you can enhance language support simply by adding new 
satellites—without changing the main assembly. 


A satellite assembly cannot contain executable code, only 
resources. 


Satellite assemblies are deployed in subdirectories of the assembly’s folder as 
follows: 


programBaseFolder\MyProgram.exe 
\MyLibrary.exe 
\XX\MyProgram.resources.d1ll 
\XX\MyLibrary.resources.d1ll 


XX refers to the two-letter language code (such as “de” for German) or a language 
and region code (such as “en-GB” for English in Great Britain). This naming system 
allows the CLR to find and load the correct satellite assembly automatically. 


Building satellite assemblies 
Recall our previous .resx example, which included the following: 


<root> 


<data name="Greeting" 
<value>hello</value> 
</data> 
</root> 


We then retrieved the greeting at runtime as follows: 
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ResourceManager r = new ResourceManager ("welcome", 
Assembly.GetExecutingAssembLy()); 
Console.Write (r.GetString ("Greeting")); 


Suppose that we want this to instead write “hallo” if running on the German version 
of Windows. The first step is to add another .resx file named welcome.de.resx that 
substitutes hello for hallo: 


<root> 
<data name="Greeting"> 
<value>hallo<value> 
</data> 
</root> 


In Visual Studio, this is all you need to do—when you rebuild, a satellite assembly 
called MyApp.resources.dll is automatically created in a subdirectory called de. 


Testing satellite assemblies 


To simulate running on an OS with a different language, you must change the 
CurrentUICulture using the Thread class: 


System. Threading. Thread. CurrentThread.CurrentUICulture 
= new System.Globalization.CultureInfo ("de"); 


CultureInfo.CurrentUICulture is a read-only version of the same property. 


A useful testing strategy is to Co¢alizs into words that can 
still be read as English, but do not use the standard Roman 
Unicode characters. 


Visual Studio designer support 


The designers in Visual Studio provide extended support for localizing components 
and visual elements. The WPF designer has its own workflow for localization; other 
Component-based designers use a design-time-only property to make it appear that a 
component or Windows Forms control has a Language property. To customize for 
another language, simply change the Language property and then start modifying 
the component. All properties of controls that are attributed as Localizable will be 
saved to a .resx file for that language. You can switch between languages at any time 
just by changing the Language property. 


Cultures and Subcultures 


Cultures are split into cultures and subcultures. A culture represents a particular 
language; a subculture represents a regional variation of that language. The Frame- 
work follows the RFC1766 standard, which represents cultures and subcultures with 
two-letter codes. Here are the codes for English and German cultures: 


En 
de 
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Here are the codes for the Australian English and Austrian German subcultures: 


en-AU 

de-AT 
A culture is represented in .NET with the System.Globalization.CultureInfo 
class. You can examine the current culture of your application as follows: 


Console.WriteLine (System. Threading. Thread.CurrentThread.CurrentCulture) ; 

Console.WriteLine (System. Threading. Thread.CurrentThread.CurrentUICuLture) ; 
Running this on a computer localized for Australia illustrates the difference between 
the two: 

en-AU 

en-US 
CurrentCulture reflects the regional settings of the Windows Control Panel, 
whereas CurrentUICulture reflects the language of the OS. 


Regional settings include such things as time zone and the formatting of currency 
and dates. CurrentCulture determines the default behavior of such functions as 
DateTime.Parse. Regional settings can be customized to the point where they no 
longer resemble any particular culture. 


CurrentUICulture determines the language in which the computer communicates 
with the user. Australia doesn't need a separate version of English for this purpose, 
so it just uses the US one. If I spent a couple of months working in Austria, I would 
go to the Control Panel and change my CurrentCulture to Austrian-German. How- 
ever, given that I can’t speak German, my CurrentUICulture would remain US 
English. 


ResourceManager, by default, uses the current thread’s CurrentUICulture property 
to determine the correct satellite assembly to load. ResourceManager uses a fallback 
mechanism when loading resources. If a subculture assembly is defined, that one is 
used; otherwise, it falls back to the generic culture. If the generic culture is not 
present, it falls back to the default culture in the main assembly. 


Loading, Resolving, and Isolating Assemblies 


Loading an assembly from a known location is a relatively simple process. We refer 
to this as assembly loading. 


More commonly, however, you (or the CLR) will need to load an assembly knowing 
only its full (or simple) name. This is called assembly resolution. Assembly resolu- 
tion differs from loading in that the assembly must first be located. 


Assembly resolution is triggered in two scenarios: 


e By the CLR, when it needs to resolve a dependency 
e Explicitly, when you call a method such as Assembly. Load(AssemblyName ) 
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To illustrate the first scenario, consider an application comprising a main assembly 
plus a set of statically referenced library assemblies (dependencies): 


AdventureGame.d1ll // Main assembly 
Terrain.dll // Referenced assembly 
UIEngine.dll // Referenced assembly 


By “statically referenced, we mean that AdventureGame.dll was compiled with ref- 
erences to Terrain.dll and UlEngine.dll. The compiler itself does not need to perform 
assembly resolution, because it’s told (either explicitly or by MSBuild) where to find 
Terrain.dll and UlEngine.dll. During compilation, it writes the full names of the Ter- 
rain and UlEngine assemblies into the metadata of AdventureGame.dll, but no 
information on where to find them. So, at runtime, the Terrain and UIJEngine 
assemblies must be resolved. 


Assembly loading and resolution is handled by an assembly load context (ALC); 
specifically, an instance of the AssemblyLoadContext class in System 
.Runtime.Loader. Because AdventureGame.dll is the main assembly for the applica- 
tion, the CLR uses the default ALC (AssemblyLoadContext.Default) to resolve its 
dependencies. The default ALC resolves dependencies first by looking for and 
examining a file called AdventureGame.deps.json (which describes where to find 
dependencies), or if not present, it looks in the application base folder, where it will 
find Terrain.dll and UlEngine.dll. (The default ALC also resolves .NET Core frame- 
work assemblies.) 


As a developer, you can dynamically load additional assemblies during the execu- 
tion of your program. For example, you might want to package optional features in 
assemblies that you deploy only when those features have been purchased. In such a 
case, you could load the extra assemblies, when present, by calling 
Assembly.Load(AssembLlyName). 


A more complex example would be implementing a plug-in system whereby the 
user can provide third-party assemblies that your application detects and loads at 
runtime to extend your application’s functionality. The complexity arises because 
each plug-in assembly might have its own dependencies that must also be resolved. 


By subclassing AssemblyLoadContext and overriding its assembly resolution 
method (Load), you can control how a plug-in finds its dependencies. For example, 
you might decide that each plug-in should reside in its own folder, and its depen- 
dencies should also reside in that folder. 


ALCs have another purpose: by instantiating a separate AssemblyLoadContext for 
each (plug-in + dependencies), you can keep each isolated, ensuring that their 
dependencies load in parallel and do not interfere with one another (nor the host 
application). Each, for instance, can have its own version of JSON.NET. Hence, in 
addition to loading and resolution, ALCs also provide a mechanism for isolation. 
Under certain conditions, ALCs can even be unloaded, freeing their memory. 


In this section, we elaborate on each of these principles, and describe the following: 
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¢ How ALCs handle loading and resolution 
The role of the default ALC 


e Assembly.Load and contextual ALCs 
e How to use AssemblyDependencyResolver 
e How to load and resolve unmanaged libraries 


Unloading ALCs 


¢ The legacy assembly loading methods 


Then, we put the theory to work and demonstrate how to write a plug-in system 
with ALC isolation. 


The AssemblyLoadContext class is new to .NET Core. In .NET 
Framework, ALCs were present but restricted and hidden: the 
only way to create and interact with them was indirectly via 
the LoadFile(string), LoadFrom(string) and Load(byte[]) 
static methods on the Assembly class. Compared to the ALC 
API, these methods are inflexible, and their use can lead to 
surprises (particularly when handling dependencies). For this 
reason, it’s best to favor explicit use of the AssemblyLoadCon 
text API in .NET Core. 


Assembly Load Contexts 


As we just discussed, the AssemblyLoadContext class is responsible for loading and 
resolving assemblies as well as providing a mechanism for isolation. 


Every .NET Assembly object belongs to exactly one AssemblyLoadContext. You can 
obtain the ALC for an assembly as follows: 


Assembly assem = Assembly.GetExecutingAssembLy(); 
AssemblyLoadContext context = AssemblyLoadContext.GetLoadContext (assem); 
Console.WriteLine (context.Name); 


Conversely, you can think of an ALC as containing or owning assemblies, which you 
can obtain via its Assemblies property. Following on from the previous example: 


foreach (Assembly a in context.Assemblies) 
Console.WriteLine (a.FullName); 


The AssemblyLoadContext class also has a static All property that enumerates all 
ALCs. 


You can create a new ALC just by instantiating AssemblyLoadContext and provid- 
ing a name (the name is helpful when debugging), although more commonly, youd 
first subclass AssemblyLoadContext so that you can implement logic to resolve 
dependencies; in other words, load an assembly from its name. 
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Loading assemblies 


AssemblyLoadContext provides the following methods to explicitly load an assem- 
bly into its context: 


public Assembly LoadFromAssemblyPath (string assemb1lyPath) ; 
public Assembly LoadFromStream (Stream assembly, Stream assemblySymbols); 


The first method loads an assembly from a file path, whereas the second method 
loads it from a Stream (which can come directly from memory). The second param- 
eter is optional and corresponds to the contents of a project debug (.pdb) file, which 
allows stack traces to include source code information when code executes (useful 
in exception reporting). 


With both of these methods, no resolution takes place. The following loads the 
assembly c:\temp\foo.dll into its own ALC: 


var alc = new AssemblyLoadContext ("Test"); 
Assembly assem = alc.LoadFromAssemblyPath (@"c:\temp\foo.dlL"); 


If the assembly is valid, loading will always succeed, subject to one important rule: 
its simple name must be unique within its ALC. This means that you cannot load 
multiple versions of the same-named assembly into a single ALC; to do this, you 
must create additional ALCs. We could load another copy of foo.dil as follows: 


var alc2 = new AssemblyLoadContext ("Test 2"); 
Assembly assem2 = alc2.LoadFromAssemblyPath (@"c:\temp\foo.dll"); 


Note that types that originate from different Assembly objects are incompatible even 
if the assemblies are otherwise identical. In our example, the types in assem are 
incompatible with the types in assem2. 


After an assembly is loaded, it cannot be unloaded except by unloading its ALC (see 
“Unloading ALCs” on page 789). The CLR maintains a lock of the file for the dura- 
tion that it’s loaded. 


Avoid locking the file by loading the assembly via a byte array: 


bytes[] bytes = File.ReadAllBytes (@"c:\temp\foo.d1L"); 
var ms = new MemoryStream (bytes); 
var assem = alc.LoadFromStream (ms); 


This has two drawbacks: 


e The assembly’s Location property will end up blank. 
Sometimes, it’s useful to know where an assembly was 
loaded from (and some APIs rely on it being populated). 


e Private memory consumption must increase immedi- 
ately to accommodate the full size of the assembly. If you 
instead load from a filename, the CLR uses a memory- 
mapped file, which enables lazy loading and process 
sharing. Also, should memory run low, the OS can 
release its memory and reload as required without writ- 
ing to a page file. 
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LoadFromAssemblyName 


AssemblyLoadContext also provides the following method, which loads an assembly 
by name: 


public Assembly LoadFromAssemblyName (AssemblyName assemblyName) ; 


Unlike the two methods just discussed, you don’t pass in any information to indicate 
where the assembly is located; instead you're instructing the ALC to resolve the 
assembly. 


Resolving assemblies 


The preceding method triggers assembly resolution. The CLR also triggers assembly 
resolution when loading dependencies. For example, suppose that assembly A stati- 
cally references assembly B. To resolve reference B, the CLR triggers assembly reso- 
lution on whichever ALC assembly A was loaded into. 


The CLR resolves dependencies by triggering assembly resolu- 
tion—whether the triggering assembly is in the default or a 
custom ALC. The difference is that with the default ALC, the 
resolution rules are hardcoded, whereas with a custom ALC, 
you write the rules yourself. 


Here's what then happens: 


1. The CLR first checks whether an identical resolution has already taken place in 
that ALC (with a matching full assembly name); if so, it returns the Assembly it 
returned before. 


2. Otherwise, it calls the ALC’s (virtual protected) Load method, which does the 
work of locating and loading the assembly. The default ALC’s Load method 
applies the rules we describe in “The Default ALC” on page 782. With a custom 
ALC, it’s entirely up to you how you locate the assembly. For instance, you 
might look in some folder and then call LoadFromAssemblyPath when you find 
the assembly. It’s also perfectly legal to return an already-loaded assembly from 
the same or another ALC (we demonstrate this in “Writing a Plug-In System” 
on page 791). 


3. If Step 2 returns null, the CLR then calls the Load method on the default ALC 
(this serves as a useful “fallback” for resolving Framework and common appli- 
cation assemblies). 


4. If Step 3 returns null, the CLR then fires the Resolving events on both ALCs— 
first, on the default ALC, and then on the original ALC. 


5. (For compatibility with .NET Framework): if the assembly still hasn’t been 
resolved, the AppDomain. CurrentDomain.AssemblyResolve event fires. 
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After this process completes, the CLR does a “sanity check” to 
ensure that whatever assembly was loaded has a name that’s 
compatible with what was requested. The simple name must 
match; the public key token must match if specified. The ver- 
sion need not match—it can be higher or lower than what was 
requested. 


From this, we can see that there are two ways to implement assembly resolution in a 
custom ALC: 


Override the ALC’s Load method 
This gives your ALC “first say” over what happens, which is usually desirable 
(and essential when you need isolation). 


Handle the ALCS Resolving event 
This fires only after the default ALC has failed to resolve assembly. 


If you attach multiple event handlers to the Resolving event, 
the first to return a non-null value wins. 


To illustrate, let’s assume that we want to load an assembly that our main application 
knew nothing about at compile-time, called foo.dll, located in c:\temp (which is dif- 
ferent from our application folder). We'll also assume that foo.dil has a private 
dependency on bar.dll. We want to ensure that when we load c:\temp\foo.dll and 
execute its code, c:\temp\bar.dll can correctly resolve. We also want to ensure that 
foo and its private dependency, bar, do not interfere with the main application. 


Let’s begin by writing a custom ALC that overrides Load: 


using System.10; 
using System.Runtime.Loader; 


class FolderBasedALC : AssemblyLoadContext 
{ 


readonly string _folder; 
public FolderBasedALC (string folder) => _folder = folder; 


protected override Assembly Load (AssemblyName assemblyName) 


{ 
// Attempt to find the assembly: 
string targetPath = Path.Combine (_folder, assemblyName.Name + ".dLL"); 


if (File.Exists (targetPath) ) 
return LoadFromAssemblyPath (targetPath); // Load the assembly 


return null; // We can't find it - it could be a framework assembly 
} 
} 


Notice that in the Load method, we return null if the assembly file is not present. 
This check is important because foo.dll will also have dependencies on the .NET 
Core framework assemblies; hence, the Load method will be called on assemblies 
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such as System.Runtime. By returning null, we allow the CLR to fall back to the 
default ALC, which will correctly resolve these assemblies. 


Notice that we didn’t attempt to load the .NET Core frame- 
work assemblies into our own ALC. The framework assem- 
blies are not designed to run outside the default ALC, and 
attempts to load them into your own ALC can result in incor- 
rect behavior, performance degradation, and unexpected type 


incompatibility. 
Here's how we could use our custom ALC to load the foo.dll assembly in c:\temp: 


var alc = new FolderBasedALC (@"c:\temp"); 
Assembly foo = alc.LoadFromAssemblyPath (@"c:\temp\foo.dlL"); 


When we subsequently begin calling code in the foo assembly, the CLR will at some 
point need to resolve the dependency on bar.dll. This is when the custom ALC’s 
Load method will fire and successfully locate the bar.dll assembly in c:\femp. 


In this case, our Load method is also capable of resolving foo.dll, so we could sim- 
plify our code to this: 


var alc = new FolderBasedALC (@"c:\temp"); 
Assembly foo = alc.LoadFromAssemblyName (new AssemblyName ("foo")); 


Now, let’s consider an alternative solution: instead of subclassing AssemblyLoadCon 
text and overriding Load, we could instantiate a plain AssemblyLoadContext and 
handle its Resolving event: 


var alc = new AssemblyLoadContext ("test"); 
alc.Resolving += (loadContext, assemblyName) => 


{ 
string targetPath = Path.Combine (@"c:\temp", assemblyName.Name + ".dlL"); 


return alc.LoadFromAssemblyPath (targetPath); // Load the assembly 

ata foo = alc.LoadFromAssemblyName (new AssemblyName ("foo")); 
Notice now that we don't need to check whether the assembly exists. Because the 
Resolving event fires after the default ALC has had a chance to resolve the assembly 
(and only when it fails), our handler won't fire for Framework assemblies. This 
makes this solution simpler, although there’s a disadvantage. Remember that in our 
scenario, the main application knew nothing about foo.dll or bar.dll at compile time. 
This means that it’s possible for the main application to itself depend on assemblies 
called foo.dll or bar.dll. If this were to occur, the Resolving event would never fire, 
and the application’s foo and bar assemblies would load, instead. In other words, we 
would fail to achieve isolation. 
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Our FolderBasedALC class is good for illustrating the concept 
of assembly resolution, but it’s of less use in real life because it 
cannot handle platform-specific and (for library projects) 
development-time NuGet dependencies. In “AssemblyDepen- 
dencyResolver” on page 788 we describe the solution to this 
problem, and in “Writing a Plug-In System” on page 791, we 
give a detailed example. 


The Default ALC 


When an application starts, the CLR assigns a special ALC to the static Assembly 
LoadContext.Default property. The default ALC is where your startup assembly 
loads, along with its statically referenced dependencies and the .NET Core Frame- 
work assemblies. 


The default ALC looks first in the default probing paths to automatically resolve 
assemblies (see “Default probing” on page 783); this normally equates to the loca- 
tions indicated in the application’s .deps.json and .runtimeconfig.json files. 


If the ALC cannot find an assembly in its default probing paths, its Resolving event 
fires. Handling this event lets you load the assembly from other locations, which 
means that you can deploy an application’s dependencies to additional locations, 
such as subfolders, shared folders, or even as a binary resource inside the host 
assembly: 


AssemblyLoadContext.Default.Resolving += (loadContext, assemblyName) => 
{ 


// Try to locate assemblyName, returning an Assembly object or null. 
// Typically you'd call LoadFromAssemblyPath after finding the file. 
I[ 0. 
3; 
The Resolving event in the default ALC also fires when a custom ALC fails to 
resolve (in other words, when its Load method returns null), and the default ALC is 
unable to resolve the assembly. 


You can also load assemblies into the default ALC from outside the Resolving 
event. Before proceeding, however, you should first determine whether you can 
solve the problem better by using a separate ALC or with the approaches we 
describe in the following section (which use the executing and contextual ALCs). 
Hardcoding to the default ALC makes your code brittle because it cannot as a whole 
be isolated (for instance, by unit testing frameworks, or by LINQPad). 


If you still want to proceed, it’s preferable to call a resolution method (i.e., LoadFrom 
AssemblyName) rather than a loading method (such as LoadFromAssemblyPath)— 
especially if your assembly is statically referenced. This is because it’s possible that 
the assembly might already be loaded, in which case LoadFromAssemblyName will 
return the already-loaded assembly, whereas LoadFromAssemblyPath will throw an 
exception. 
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(With LoadFromAssemblyPath, you can also run the risk of loading the assembly 
from a place that’s inconsistent with where the ALC’s default resolution mechanism 
would find it.) 


If the assembly is in a place where the ALC won't automatically find it, you can still 
follow this procedure and additionally handle the ALC’s Resolving event. 


Note that when calling LoadFromAssemblyName, you dont need to provide the full 
name; the simple name will do (and is valid even if the assembly is strongly named): 


AssemblyLoadContext.Default.LoadFromAssemblyName ("System.XmL"); 


However, if you include the public key token in the name, it must match with what's 
loaded. 


Default probing 


The default probing paths normally comprise the following: 


¢ Paths specified in AppName.deps.json (where AppName is the name of your 
application’s main assembly). If this file is not present, the application base 
folder is used instead. 


¢ Folders containing the .NET Core Framework assemblies (if your application is 
Framework-dependent). 


MSBuild automatically generates a file called AppName.deps.json, which describes 
where to find all of its dependencies. These include platform-agnostic assemblies, 
which are placed in the application base folder, and platform-specific assemblies, 
which are placed in the runtimes\ subdirectory under a subfolder such as win or 
unix. 


The paths specified in the generated .deps.json file are relative to application base 
folder—or any additional folders that you specify in the additionalProbingPaths 
section of the AppName.runtimeconfig.json and/or AppName.runtimeconfig.dev.json 
configuration files (the latter is intended only for the development environment). 


The Current ALC 


In the preceding section, we cautioned against explicitly loading assemblies into the 
default ALC. What you usually want, instead, is to load/resolve into the current 
ALC. 


In most cases, the current ALC is the one containing the currently executing assem- 
bly: 
var executingAssem = Assembly.GetExecutingAssembly(); 


var alc = AssemblyLoadContext.GetLoadContext (executingAssem) ; 


alc.LoadFromAssemblyName (...); // to resolve by name 
alc.LoadFromAssemblyPath (...); // to load by path 


Assembly assem 


// OR: 
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Here's a more flexible and explicit way to obtain the ALC: 


var myAssem = typeof (SomeTypeInMyAssembly).Assembly; 
var alc = AssemblyLoadContext.GetLoadContext (myAssem); 


Sometimes, it’s impossible to infer the current ALC. For example, suppose that you 
were responsible for writing the NET Core binary serializer that we covered in 
Chapter 17. A serializer such as this writes the full names of the types that it serial- 
izes (including their assembly names), which must be resolved during deserializa- 
tion. The question is, which ALC should you use? The problem with relying on the 
executing assembly is that it will return whatever assembly contains the deserializer, 
not the assembly that’s calling the deserializer. 


The best solution is not to guess, but to ask: 


public object Deserialize (Stream stream, AssemblyLoadContext alc) 


t 
ia 


Being explicit maximizes flexibility and minimizes the chance of making mistakes. 
The caller can now decide what should count as the “current” ALC: 


var assem = typeof (SomeTypeThatIWillBeDeserializing).Assembly; 
var alc = AssemblyLoadContext.GetLoadContext (assem); 
var object = Deserialize (someStream, alc); 


Assembly.Load and Contextual ALCs 


To help with the common case of loading an assembly into the currently executing 
ALC; that is: 


var executingAssem = Assembly.GetExecutingAssemb1ly(); 
var alc = AssemblyLoadContext.GetLoadContext (executingAssem) ; 
Assembly assem = alc.LoadFromAssemblyName (...); 


Microsoft has defined the following method in the Assembly class: 
public static Assembly Load (string assemblyString); 

as well as a functionally identical version that accepts an AssemblyName object: 
public static Assembly Load (AssemblyName assemblyRef); 


(Don't confuse these methods with the legacy Load(byte[]) method, which behaves 
ina totally different manner—see “The Legacy Loading Methods” on page 789.) 


As with LoadFromAssemblyName, you have a choice of specifying the assembly’s sim- 
ple, partial, or full name: 


Assembly a = Assembly.Load ("System.Private.XmL"); 


This loads the System.Private.Xml assembly into whatever ALC the executing 
code’s assembly is loaded in. 
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In this case, we specified a simple name. The following strings would also be valid, 
and all would have the same result in .NET Core 3: 


"System.Private.Xml, PublickeyToken=cc7b13ffcd2ddd51" 
"System.Private.Xml, Version=4.0.1.0" 
"System.Private.Xml, Version=4.0.1.0, PublickKeyToken=cc7b13ffcd2ddd51" 


If you choose to specify a public key token, it must match with what’s loaded. 


The Microsoft Developer Network (MSDN) cautions against 
loading an assembly from a partial name, recommending that 
you specify the exact version and public key token. Their 
rationale is based on factors relevant to .NET Framework, 
such as the effects of the Global Assembly Cache and Code 
Access Security. In .NET Core, these factors aren't present, and 
it’s generally safe to load from a simple or partial name. 


Both of these methods are strictly for resolution, so you cannot specify a file path. (If 
you populate the CodeBase property in the AssemblyName object, it will be ignored.) 


Don't fall into the trap of using AssembLy. Load to load a stati- 
cally referenced assembly. All you need do in this case is refer 
to a type in the assembly, and obtain the assembly from that: 

Assembly a = typeof (System.Xml.Formatting) .Assembly; 

Or, you could even do this: 

Assembly a = System.Xml.Formatting.Indented.GetType().Assembly; 
This avoids hardcoding the assembly name (which you might 
change in the future) while triggering assembly resolution on 
the executing code’s ALC (as would happen with Assembly 
. Load). 


If you were to write the Assembly.Load method yourself, it would (almost) look like 
this: 


[MethodImpl(MethodImplOptions.NoInlining) ] 
Assembly Load (string name) 


{ 
Assembly callingAssembly = Assembly.GetCallingAssembly(); 


var callingAlc = AssemblyLoadContext.GetLoadContext (callingAssembly); 
return callingAlc.LoadFromAssemblyName (new AssemblyName (name) ); 


} 


EnterContextualReflection 


Assembly.Load’s strategy of using the calling assembly's ALC context fails when 
Assembly.Load is called via an intermediary, such as a deserializer or unit test run- 
ner. If the intermediary is defined in a different assembly, the intermediary's load 
context is used instead of the caller’s load context. 
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We described this scenario earlier, when we talked about how 
you might write a deserializer. In such cases, the ideal solution 
is to force the caller to specify an ALC rather than inferring it 
with Assembly.Load(string). 


But because .NET Core evolved from .NET Framework— 
where isolation was accomplished with application domains 
rather than ALCs—the ideal solution is not prevalent, and 
Assembly.Load(string) is sometimes used inappropriately in 
scenarios in which the ALC cannot be reliably inferred. An 


example is the .NET Core binary serializer. 


To allow Assembly.Load to still work in such scenarios, Microsoft has added 
a method to AssemblyLoadContext called EnterContextualReflection. This 
assigns an ALC to AssemblyLoadContext.CurrentContextualReflectionContext. 
Although this is a static property, its value is stored in an AsyncLocal variable, so it 
can hold separate values on different threads (but still be preserved throughout 
asynchronous operations). 


If this property is non-null, Assembly.Load automatically uses it in preference to the 
calling ALC: 


Method1(); 


var myALC = new AssemblyLoadContext ("test"); 
using (myALC.EnterContextualReflection()) 


{ 


Console.WriteLine ( 


AssemblyLoadContext.CurrentContextualReflectionContext.Name); // test 


Method2(); 
} 


// Once disposed, EnterContextualReflection() no longer 
Method3(); 


void Method1() => Assembly.Load ("..."); // Will use 
void Method2() => Assembly.Load ("...")3 // Will use 
void Method3() => Assembly.Load ("..."); // Will use 


has an effect. 


calling ALC 
myALC 
calling ALC 


We previously demonstrated how you could write a method that’s functionally simi- 
lar to Assembly.Load. Here’s a more accurate version that takes the contextual reflec- 
tion context into account: 


[MethodImpl(MethodImplOptions.NoInlining) ] 
Assembly Load (string name) 


{ 


var alc = AssemblyLoadContext.CurrentContextualReflectionContext 
?? AssemblyLoadContext.GetLoadContext (Assembly.GetCallingAssembly()); 


return alc.LoadFromAssemblyName (new AssemblyName (name)); 


} 
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Even though the contextual reflection context can be useful in allowing legacy code 
to run, a more robust solution (as we described earlier) is to modify the code that 
calls Assembly.Load so that it instead calls LoadFromAssemblyName on an ALC that’s 
passed in by the caller. 


.NET Framework has no equivalent of EnterContextual 
Reflection—and does not need it—despite having the same 
Assembly.Load methods. This is because with .NET Frame- 
work, isolation is accomplished primarily with application 
domains rather than ALCs. Application domains provide a 
stronger isolation model whereby each application domain 
has its own default load context, so isolation can still work 
even when only the default load context is used. 


Loading and Resolving Unmanaged Libraries 


ALCs can also load and resolve native libraries. Native resolution is triggered when 
you call an external method that’s marked with the [DllImport] attribute: 


[DllImport ("SomeNativeLibrary.dll")] 
static extern int SomeNativeMethod (string text); 


Because we didn't specify a full path in the [DllImport] attribute, calling Some 
NativeMethod triggers a resolution in whatever ALC contains the assembly in which 
SomeNativeMethod is defined. 


The virtual resolving method in the ALC is called LoadUnmanagedD1l, and the load- 
ing method is called LoadUnmanagedDLLFromPath: 


protected override IntPtr LoadUnmanagedD1U1L (string unmanagedD1LName) 


{ 
// Locate the full path of unmanagedDLLName... 


string fullPath =... 
return LoadUnmanagedDlU1lFromPath (fullPath) ; // Load the DLL 


} 


If you're unable to locate the file, you can return IntPtr.Zero. The CLR will then 
fire the ALC’s ResolvingUnmanagedD11 event. 


Interestingly, the LoadUnmanagedDLLFromPath method is protected, so you won't 
usually be able to call it from a ResolvingUnmanagedD1U1 event handler. However, 
you can achieve the same result by calling the static NativeLibrary.Load: 


someALC.ResolvingUnmanagedDLL += (requestingAssembly, unmanagedDLIName) => 


{ 
return NativeLibrary.Load ("(full path to unmanaged DLL)"); 


t3 
Although native libraries are typically resolved and loaded by ALCs, they don't 
“belong” to an ALC. After it’s loaded, a native library stands on its own and takes 
responsibility for resolving any transitive dependencies that it might have. Further- 
more, native libraries are global to the process, so it’s not possible to load two differ- 
ent versions of a native library, if they have the same filename. 
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AssemblyDependencyResolver 


In “Default probing” on page 783, we said that the default ALC reads the .deps.json 
and .runtimeconfig.json files, if present, in determining where to look to resolve 
platform-specific and development-time NuGet dependencies. 


If you want to load an assembly into a custom ALC that has platform-specific or 
NuGet dependencies, you'll need to somehow reproduce this logic. You could 
accomplish this by parsing the configuration files and carefully following the rules 
on platform-specific monikers, but doing so is not only difficult, but the code that 
you write will break if the rules change in a later version of .NET Core. 


The AssemblyDependencyResolver class solves this problem. To use it, you instanti- 
ate it with the path of the assembly whose dependencies you want to probe: 


var resolver = new AssemblyDependencyResolver (@"c:\temp\foo.dll"); 


Then, to find the path of a dependency, you call the ResolveAssemblyToPath 
method: 


string path = resolver.ResolveAssemblyToPath (new AssemblyName ("bar")); 


In the absence of a .deps.json file (or if the .deps.json doesn't contain anything rele- 
vant to bar.dll), this will evaluate to c:\temp\bar. dll. 


You can similarly resolve unmanaged dependencies by calling Resolve 
UnmanagedDLLToPath. 


A great way to illustrate a more complex scenario is to create a new Console project 
called ClientApp and then add a NuGet reference to Microsoft.Data.SqlClient. Add 
the following class: 


using Microsoft.Data.SqlClient; 


namespace ClientApp 


{ 
public class Program 
{ 
public static SqlConnection GetConnection() => new SqlConnection(); 
static void Main() => GetConnection(); // Test that it resolves 
} 
} 


Now build the application and look in the output folder: you'll see a file called 
Microsoft.Data.SqlClient.dll. However, this file never loads when run, and 
attempting to explicitly load it throws an exception. The assembly that actually 
loads is located in the runtimes\win (or runtimes/unix) subfolder; the default ALC 
knows to load it because it parses the ClientApp.deps.json file. 


If you were to try to load the ClientApp.dll assembly from another application, youd 
need to write an ALC that can resolve its dependency, Microsoft.Data.SqIClient.dll. 
In doing so, it would be insufficient to merely look in the folder where ClientApp.dll 
is located (as we did in “Resolving assemblies” on page 779). Instead, youd need to 
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use AssemblyDependencyResolver to determine where that file is located for the 
platform in use: 


string path = @"C:\source\ClientApp\bin\Debug\netcoreapp3.0\ClientApp.dll"; 
var resolver = new AssemblyDependencyResolver (path); 

var sqlClient = new AssemblyName ("Microsoft.Data.SqlClient"); 
Console.WriteLine (resolver .ResolveAssemblyToPath (sqlClient)); 


On a Windows machine, this outputs the following: 


C:\source\ClientApp\bin\Debug\netcoreapp3.0\runtimes\win\lib\netcoreapp2 
\Microsoft.Data.SqlClient.d1ll 


We give a complete example in “Writing a Plug-In System” on page 791. 


Unloading ALCs 


In simple cases, it’s possible to unload a nondefault AssemblyLoadContext, freeing 
memory and releasing file locks on the assemblies it loaded. For this to work, the 
ALC must have been instantiated with the isCollectible parameter true: 


var alc = new AssemblyLoadContext ("test", isCollectible: true); 
You can then call the Unload method on the ALC to initiate the unload process. 


The unload model is cooperative rather than preemptive. If any methods in any of 
the ALC’s assemblies are executing, the unload will be deferred until those methods 
finish. 


The actual unload takes place during garbage collection; it will not take place if any- 
thing from outside the ALC has any (nonweak) reference to anything inside the 
ALC (including objects, types, and assemblies). It’s not uncommon for APIs 
(including those in the .NET Core Framework) to cache objects in static fields or 
dictionaries—or subscribe to events—and this makes it easy to create references that 
will prevent an unload, especially if code in the ALC uses APIs outside its ALC in a 
nontrivial way. Determining the cause of a failed unload is difficult and requires the 
use of tools such as WinDbg. 


The Legacy Loading Methods 


If youre still using .NET Framework (or writing a library that targets .NET Stan- 
dard, and want to support .NET Framework) you won't be able to use the Assembly 
LoadContext class. Loading is accomplished instead by using the following 
methods: 


public static Assembly LoadFrom (string assemblyFile) ; 
public static Assembly LoadFile (string path); 
public static Assembly Load (byte[] rawAssembly); 


LoadFile and Load(byte[]) provide isolation, whereas LoadFrom does not. 


Resolution is accomplished by handling the application domain’s AssemblyResolve 
event, which works like the default ALC’s Resolving event. 
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The Assembly.Load(string) method is also available to trigger resolution and 
works in a similar way. 


LoadFrom 


LoadFrom loads an assembly from a given path into the default ALC. It’s a bit like 
calling AssemblyLoadContext.Default.LoadFromAssemblyPath except for the 
following: 


e If an assembly with the same simple name is already present in the default 
ALC, LoadFrom returns that assembly rather than throwing an exception. 


¢ If an assembly with the same simple name is not already present in the default 
ALC, and a load takes place, the assembly is given a special LoadFrom status. 
This status affects the default ALC’s resolution logic, in that should that assem- 
bly have any dependencies in the same folder, those dependencies will resolve 
automatically. 


.NET Framework has a Global Assembly Cache (GAC). If the 
assembly is present in the GAC, the CLR will always load from 
there instead. This applies to all three loading methods. 


LoadFronm’s ability to automatically resolve transitive same-folder dependencies can 
be convenient—until it loads an assembly that it shouldn't. Because such scenarios 
can be difficult to debug, it can be better to use Load(string) or LoadFile and 
resolve transitive dependencies by handling the application domain’s Assembly 
Resolve event. This gives you the power to decide how to resolve each assembly and 
allows for debugging (by creating a breakpoint inside the event handler). 


LoadFile and Load(byte[]) 


LoadFile and Load(byte[]) load an assembly from a given file path or byte array 
into anew ALC. Unlike LoadFrom, these methods provide isolation and let you load 
multiple versions of the same assembly. However, there are two caveats: 


¢ Calling LoadFile again with the identical path will return the previously loaded 
assembly. 


e In .NET Framework, both methods first check the GAC and load from there 
instead if the assembly is present. 


With LoadFile and Load(byte[]), you end up with a separate ALC per assembly 
(caveats aside). This enables isolation, although it can make it more awkward to 
manage. 


To resolve dependencies, you handle the AppDomain’s Resolving event, which fires 
on all ALCs: 
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AppDomain.CurrentDomain.AssemblyResolve += (sender, args) => 


{ 


string fullAssemblyName = args.Name; 
// return an Assembly object or null 


3; 


The args variable also includes a property called RequestingAssembly, which tells 
you which assembly triggered the resolution. 


After locating the assembly, you can then call Assembly.LoadFile to load it. 


You can enumerate all of the assemblies that have been loaded 
into the current application domain with AppDomain. Current 
Domain.GetAssemblies(). This works in .NET Core, too, 
where it’s equivalent to the following: 


AssemblyLoadContext.ALl.SelectMany (a => a.Assemblies) 


Writing a Plug-In System 


To fully demonstrate the concepts that we've covered in this section, let’s write a 
plug-in system that uses unloadable ALCs to isolate each plug-in. 


Our demo system will initially comprise three .NET Core projects: 


Plugin.Common (library) 
Defines an interface that plug-ins will implement 


Capitalizer (library) 
A plug-in that capitalizes text 


Plugin.Host (Console application) 
Locates and invokes plug-ins 


Let’s assume that the projects reside in the following directories: 


c:\source\PLuginDemo\PLlugin. Common 
c:\source\PLuginDemo\Capitalizer 
c:\source\PLuginDemo\PLlugin.Host 


All projects will reference the Plugin.Common library, and there will be no other 
interproject references. 


If Plugin.Host were to reference Capitalizer, we wouldn't be 
writing a plug-in system; the central idea is that the plug-ins 
are written by third parties after Plugin. Host and Plugin.Com- 
mon have been published. 


If you're using Visual Studio, it can be convenient to put all 
three projects into a single solution for the sake of this demo. 
If you do so, right-click the Plugin.Host project, choose Build 
Dependencies > Project Dependencies, and then tick the Cap- 
italizer project. This forces Capitalizer to build when you run 
the Plugin.Host project, without adding a reference. 
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Plugin.Common 


Let’s begin with Plugin.;Common. Our plug-ins will perform a very simple task, 
which is to transform a string. Here’s how we'll define the interface: 


namespace Plugin.Common 


{ 


public interface ITextPlugin 


{ 


string TransformText (string input); 
} 
} 


That's all there is to Plugin.Common. 


Capitalizer (plug-in) 


Our Capitalizer plug-in will reference Plugin.;Common and contain a single class. 
For now, we'll keep the logic simple, so that the plug-in has no extra dependencies: 


public class CapitalizerPlugin : Plugin.Common.ITextPlugin 


{ 
public string TransformText (string input) => input. ToUpper(); 


} 


If you build both projects and look in Capitalizer’s output folder, you'll see the fol- 
lowing two assemblies: 


Capitalizer.dll // Our plug-in assembly 
Plugin.Common.dtl // Referenced assembly 


Plugin.Host 


Plugin.Host is a Console application with two classes. The first class is a custom 
ALC to load the plug-ins: 


class PluginLoadContext : AssemblyLoadContext 
{ 


AssemblyDependencyResolver _resolver; 


public PluginLoadContext (string pluginPath, bool collectible) 
// Give it a friendly name to help with debugging: 
: base (name: Path.GetFileName (pluginPath), collectible) 
{ 
// Create a resolver to help us find dependencies. 
_resolver = new AssemblyDependencyResolver (pluginPath) ; 


} 


protected override Assembly Load (AssemblyName assemblyName) 


{ 
// See below 
if (assemblyName.Name == typeof (ITextPlugin).Assembly.GetName().Name) 
return null; 


string target = _resolver.ResolveAssemblyToPath (assemblyName) ; 
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if (target != null) 
return LoadFromAssemblyPath (target); 


// Could be a framework assembly. Allow the default context to resolve. 
return null; 


} 


protected override IntPtr LoadUnmanagedD1ll (string unmanagedD1LName) 
{ 


string path = _resolver.ResolveUnmanagedDLIToPath (unmanagedDL1Name) ; 


return path == null 
? IntPtr.Zero 
: LoadUnmanagedDLLFromPath (path); 
} 
} 


In the constructor, we pass in the path to the main plug-in assembly as well as a flag 
to indicate whether wed like the ALC to be collectible (so that it can be unloaded). 


The Load method is where we handle dependency resolution. All plug-ins must ref- 
erence Plugin.Ccommon so that they can implement ITextPlugin. This means that 
the Load method will fire at some point to resolve Plugin.;Common. We need to be 
careful because the plug-in’s output folder is likely to contain not only Capital- 
izer.dll, but also its own copy of Plugin.Common.dll. If we were to load this copy of 
Plugin.Common.dll into the PLuginLoadContext, wed end up with two copies of the 
assembly: one in the host’s default context, and one in the plug-in’s PluginLoad 
Context. The assemblies would be incompatible, and the host would complain that 
the plug-in does not implement ITextPlugin! 


To solve this, we check explicitly for this condition: 


if (assemblyName.Name == typeof (ITextPlugin).Assembly.GetName().Name) 
return null; 


Returning null allows the host’s default ALC to instead resolve the assembly. 


Instead of returning null, we could return typeof(IText 
Plugin) .Assembly, and it would also work correctly. How can 
we be certain that ITextPlugin will resolve on the host’s ALC 
and not on our PluginLoadContext? Remember that our 
PluginLoadContext class is defined in the Plugin.Host 
assembly. Therefore, any types that you statically reference 
from this class will trigger an assembly resolution on the ALC 
into which its assembly, PLugin.Host, was loaded. 


After checking for the common assembly, we use AssemblyDependencyResolver to 
locate any private dependencies that the plug-in might have. (Right now, there will 
be none.) 


Notice that we also override the LoadUnamangedD11 method. This ensures that if the 
plug-in has any unmanaged dependencies, these will load correctly, too. 
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The second class to write in Plugin.Host is the main program itself. For simplicity, 
let's hardcode the path to our Capitalizer plug-in (in real life, you might discover the 
paths of plug-ins by looking for DLLs in known locations, or reading from a config- 
uration file): 


class Program 


const bool UseCollectibleContexts = true; 


static void Main() 


{ 


const string captializer = @"C:\source\PluginDemo\" 
+ @"Capitalizer\bin\Debug\netcoreapp3.0\Capitalizer.dll"; 


Console.WriteLine (TransformText ("big apple", captializer)); 


} 


static string TransformText (string text, string pluginPath) 


{ 
var alc = new PluginLoadContext (pluginPath, UseCollectibleContexts) ; 
try 


{ 
Assembly assem = alc.LoadFromAssemblyPath (pluginPath) ; 


// Locate the type in the assembly that implements ITextPlugin: 
Type pluginType = assem.ExportedTypes.Single (t => 
typeof (ITextPlugin).IsAssignableFrom (t)); 


// Instantiate the ITextPlugin implementation: 
var plugin = (ITextPlugin)Activator.CreateInstance (pluginType) ; 


// Call the TransformText method: 
return plugin. TransformText (text); 


} 
finally 


if (UseCollectibleContexts) alc.Unload(); // unload the ALC 
} 
} 
} 


Let’s look at the TransformText method. We first instantiate a new ALC for our 
plug-in and then ask it to load the main plug-in assembly. Next, we use Reflection to 
locate the type that implements ITextPlugin (we cover this in detail in Chapter 19). 
Then, we instantiate the plug-in, call the TransformText method, and unload the 
ALC. 


If you needed to call the TransformText method repeatedly, a 
better approach would be to cache the ALC rather than 
unloading it after each call. 


Here's the output: 


BIG APPLE 





794 | Chapter 18: Assemblies 


Adding dependencies 


Our code is fully capable of resolving and isolating dependencies. To illustrate, let’s 
first add a NuGet reference to Humanizer.Core, version 2.6.2. You can do this via the 
Visual Studio UI, or by adding the following element to the Capitalizer.csproj file: 


<ItemGroup> 
<PackageReference Include="Humanizer.Core" Version="2.6.2" /> 


</ItemGroup> 
Now, modify CapitalizerPlugin as follows: 


using Humanizer; 
namespace Capitalizer 


public class CapitalizerPlugin : Plugin.Common.ITextPlugin 


{ 


public string TransformText (string input) => input.Pascalize(); 


i 
} 


If you rerun the program, the output will now be this: 


BigApple 
Next, we create another plug-in called Pluralizer. Create a new .NET Core library 
project, and add a NuGet reference to Humanizer.Core, version 2.7.9: 


<ItemGroup> 
<PackageReference Include="Humanizer.Core" Version="2.7.9" /> 


</ItemGroup> 
Now, add a class called PluralizerPlugin. This will be similar to Capitalizer 
PlugIn, but we call the Pluralize method instead: 


using Humanizer; 
namespace Pluralizer 


{ 
public class PluralizerPlugin : Plugin.Common.ITextPlugin 
{ 
public string TransformText (string input) => input.Pluralize(); 
} 
} 


Finally, we need to add code to the Plugin.Host’s Main method to load and run the 
Pluralizer plug-in: 


static void Main() 


{ 


const string captializer = @"C:\source\PluginDemo\" 
+ @"Capitalizer\bin\Debug\netcoreapp3.0\Capitalizer.dll"; 


Console.WriteLine (TransformText ("big apple", captializer)); 


const string pluralizer = @"C:\source\PluginDemo\" 
+ Q@"Pluralizer\bin\Debug\netcoreapp3.0\Pluralizer.d1lL"; 
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Console.WriteLine (TransformText ("big apple", pluralizer)); 


} 
The output will now be like this: 


BigApple 
big apples 


To fully see what's going on, change the UseCollectibleContexts constant to false 
and add the following code to the Main method to enumerate the ALCs and their 
assemblies: 


foreach (var context in AssemblyLoadContext.ALL) 


{ 


Console.WriteLine ($"Context: {context.GetType().Name} {context.Name}"); 


foreach (var assembly in context.Assemblies) 
Console.WriteLine ($" Assembly: {assembly.FulLName}"); 


} 


In the output, you can see two different versions of Humanizer, each loaded into its 
own ALC: 


Context: PluginLoadContext Capitalizer.d1l 
Assembly: Capitalizer, Version=1.0.0.0, Culture=neutral, PublicKeyToken=... 
Assembly: Humanizer, Version=2.6.0.0, Culture=neutral, PublicKeyToken=... 
Context: PluginLoadContext Pluralizer.d1ll 
Assembly: Pluralizer, Version=1.0.0.0, Culture=neutral, PublicKeyToken=... 
Assembly: Humanizer, Version=2.7.0.0, Culture=neutral, PublicKeyToken=... 
Context: DefaultAssemblyLoadContext Default 
Assembly: System.Private.CoreLib, Version=4.0.0.0, Culture=neutral,... 
Assembly: Host, Version=1.0.0.0, Culture=neutral, PublicKeyToken=null 


Even if both plug-ins were to use the same version of Human- 
izer, the isolation of separate assemblies can still be beneficial 
because each will have its own static variables. 
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19 


Reflection and Metadata 








As we saw in Chapter 18, a C# program compiles into an assembly that includes 
metadata, compiled code, and resources. Inspecting the metadata and compiled 
code at runtime is called reflection. 


The compiled code in an assembly contains almost all of the content of the original 
source code. Some information is lost, such as local variable names, comments, and 
preprocessor directives. However, reflection can access pretty much everything else, 
even making it possible to write a decompiler. 


Many of the services available in .NET and exposed via C# (such as dynamic bind- 
ing, serialization, and data binding) depend on the presence of metadata. Your own 
programs can also take advantage of this metadata and even extend it with new 
information using custom attributes. The System.Reflection namespace houses 
the reflection API. It is also possible at runtime to dynamically create new metadata 
and executable instructions in IL via the classes in the System.Reflection.Emit 
namespace. 


The examples in this chapter assume that you import the System and 
System.Ref lection as well as System.Reflection. Emit namespaces. 


When we use the term “dynamically” in this chapter, we mean 
using reflection to perform some task whose type safety is 
enforced only at runtime. This is similar in principle to 
dynamic binding via C#’s dynamic keyword, although the 
mechanism and functionality are different. 


Dynamic binding is much easier to use and employs the 
Dynamic Language Runtime (DLR) for dynamic language 
interoperability. Reflection is relatively clumsy to use, but it is 
more flexible in terms of what you can do with the CLR. For 
instance, reflection lets you obtain lists of types and members, 
instantiate an object whose name comes from a string, and 
build assemblies on the fly. 
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Reflecting and Activating Types 


In this section, we examine how to obtain a Type, inspect its metadata, and use it to 
dynamically instantiate an object. 


Obtaining a Type 


An instance of System.Type represents the metadata for a type. Because Type is 
widely used, it lives in the System namespace rather than the System.Reflection 
namespace. 


You can get an instance of a System. Type by calling GetType on any object or with 
C#’s typeof operator: 


Type t1 = DateTime.Now.GetType(); // Type obtained at runtime 
Type t2 = typeof (DateTime); // Type obtained at compile time 


You can use typeof to obtain array types and generic types, as follows: 


Type t3 = typeof (DateTime[]); // 1-d Array type 
Type t4 = typeof (DateTime[,]); // 2-d Array type 
Type tS = typeof (Dictionary<int,int>); // Closed generic type 
Type t6 = typeof (Dictionary<,>); // Unbound generic type 


You can also retrieve a Type by name. If you have a reference to its Assembly, call 
Assembly.GetType (we describe this further in the section “Reflecting Assemblies” 
on page 817): 


Type t = Assembly.GetExecutingAssembly().GetType ("Demos.TestProgram") ; 


If you don't have an Assembly object, you can obtain a type through its assembly 
qualified name (the type’s full name followed by the assembly’s fully or partially 
qualified name). The assembly implicitly loads as if you called Assembly.Load 
(string): 


Type t = Type.GetType ("System.Int32, System.Private.CoreLib"); 


After you have a System. Type object, you can use its properties to access the type’s 
name, assembly, base type, visibility, and so on: 


Type stringType = typeof (string); 


string name = stringType.Name; // String 

Type baseType = stringType.BaseType; // typeof (Object) 
Assembly assem = stringType.Assembly; // System.Private.CoreLib 
bool isPublic = stringType.IsPublic; // true 


A System. Type instance is a window into the entire metadata for the type—and the 
assembly in which it’s defined. 


System. Type is abstract, so the typeof operator must actually 
give you a subclass of Type. The subclass that the CLR uses is 
internal to .NET and is called RuntimeType. 
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Typelnfo 


Should you target NET Core 1.x (or an older Windows Store profile), you'll find 
most of Type’s members are missing. These missing members are exposed instead 
on a class called TypeInfo, which you obtain by calling GetTypeInfo. So, to get our 
previous example to run, you would do this: 


Type stringType = typeof(string); 

string name = stringType.Name; 

Type baseType = stringType.GetTypeInfo().BaseType; 
Assembly assem = stringType.GetTypeInfo().Assembly; 
bool isPublic = stringType.GetTypeInfo().IsPublic; 


TypeInfo also exists in .NET Core 2 and 3 (and .NET Framework 4.5+, and all .NET 
Standard versions), so the preceding code works almost universally. TypeInfo also 
includes additional properties and methods for reflecting over members. 


Obtaining array types 


As we just saw, typeof and GetType work with array types. You can also obtain an 
array type by calling MakeArrayType on the element type: 


Type simpleArrayType = typeof (int).MakeArrayType(); 
Console.WriteLine (simpleArrayType == typeof (int[])); // True 


You can create multidimensional arrays by passing an integer argument to Make 
ArrayType: 


Type cubeType = typeof (int).MakeArrayType (3); // cube shaped 
Console.WriteLine (cubeType == typeof (int[,,])); // True 


GetElementType does the reverse: it retrieves an array type’s element type: 
Type e = typeof (int[]).GetElementType(); // e == typeof (int) 
GetArrayRank returns the number of dimensions of a rectangular array: 


int rank = typeof (int[,,]).GetArrayRank(); // 3 


Obtaining nested types 
To retrieve nested types, call GetNestedTypes on the containing type: 


foreach (Type t in typeof (System.Environment) .GetNestedTypes()) 
Console.WriteLine (t.FullName); 
OUTPUT: System. Environment+SpecialFolder 
Or: 


foreach (TypeInfo t in typeof (System.Environment) .GetTypeInfo() 
.DeclaredNestedTypes) 
Debug.WriteLine (t.FullName) ; 


The one caveat with nested types is that the CLR treats a nested type as having spe- 
cial “nested” accessibility levels: 
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Type t = typeof (System.Environment.SpecialFolder) ; 


Console.WriteLine (t.IsPublic); // False 
Console.WriteLine (t.IsNestedPublic); // True 
Type Names 


A type has Namespace, Name, and FullName properties. In most cases, FullName is a 
composition of the former two: 


Type t = typeof (System. Text.StringBuilder); 


Console.WriteLine (t.Namespace) ; // System.Text 
Console.WriteLine (t.Name); // StringBuilder 
Console.WriteLine (t.FullName) ; // System.Text.StringBuilder 


There are two exceptions to this rule: nested types and closed generic types. 


Type also has a property called AssemblyQualifiedName, 
which returns FullName followed by a comma and then the 
full name of its assembly. This is the same string that you can 
pass to Type.GetType, and it uniquely identifies a type within 
the default loading context. 


Nested type names 
With nested types, the containing type appears only in FullName: 


Type t = typeof (System.Environment.SpecialFolder ) ; 


Console.WriteLine (t.Namespace) ; // System 
Console.WriteLine (t.Name); // SpecialFolder 
Console.WriteLine (t.FullLName) ; // System.Environment+SpecialFolder 


The + symbol differentiates the containing type from a nested namespace. 


Generic type names 


Generic type names are suffixed with the ' symbol, followed by the number of type 
parameters. If the generic type is unbound, this rule applies to both Name and 
FullName: 


Type t = typeof (Dictionary<,>); // Unbound 
Console.WriteLine (t.Name); // Dictionary'2 
Console.WriteLine (t.FullName); // System.Collections.Generic.Dictionary'2 


If the generic type is closed, however, FullName (only) acquires a substantial extra 
appendage. Each type parameter’s full assembly qualified name is enumerated: 


Console.WriteLine (typeof (Dictionary<int,string>).FullName) ; 


// OUTPUT: 

System.Collections.Generic.Dictionary 2[[System. Int32, 
System.Private.CoreLib, Version=4.0.0.0, Culture=neutral, 
PublickeyToken=7cec85d7bea7798e],[System.String, System.Private.CoreLib, 
Version=4.0.0.0, Culture=neutral, PublickeyToken=7cec85d7bea7798e ] | 
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This ensures that AssemblyQualifiedName (a combination of the type’s full name 
and assembly name) contains enough information to fully identify both the generic 
type and its type parameters. 


Array and pointer type names 
Arrays present with the same suffix that you use in a typeof expression: 


Console.WriteLine (typeof ( int[] ).Name); // Int32[] 
Console.WriteLine (typeof ( int[,] ).Name); // Int32[,] 
Console.WriteLine (typeof ( int[,] ).FullName); // System.Int32[, ] 


Pointer types are similar: 


Console.WriteLine (typeof (byte*).Name); // Byte* 


ref and out parameter type names 
A Type describing a ref or out parameter has an & suffix: 


public void RefMethod (ref int p) 


{ 
Type t = MethodInfo.GetCurrentMethod().GetParameters()[0].ParameterType; 


Console.WriteLine (t.Name); // Int32& 
} 


More on this later, in the section “Reflecting and Invoking Members” on page 805. 


Base Types and Interfaces 
Type exposes a BaseType property: 


Type base1l = typeof (System.String) .BaseType; 
Type base2 = typeof (System.10.FileStream) .BaseType; 


Console.WriteLine (base1.Name) ; // Object 
Console.WriteLine (base2.Name); // Stream 


The GetInterfaces method returns the interfaces that a type implements: 


foreach (Type iType in typeof (Guid).GetInterfaces()) 
Console.WriteLine (iType.Name) ; 


IFormattable 
IComparable 
IComparable'1 
IEquatable'1 


Reflection provides two dynamic equivalents to C#’s static is operator: 


IsInstanceOfType 
Accepts a type and instance 


IsAssignableFrom 
Accepts two types 
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Here's an example of the first: 


object obj = Guid.NewGuid(); 
Type target = typeof (IFormattable); 


bool isTrue = obj is IFormattable; // Static C# operator 
bool alsoTrue = target.IsInstanceOfType (obj);  // Dynamic equivalent 


IsAssignableFrom is more versatile: 


Type target = typeof (IComparable), source = typeof (string); 
Console.WriteLine (target.IsAssignableFrom (source)); // True 


The IsSubclassOf method works on the same principle as IsAssignableFrom but 
excludes interfaces. 


Instantiating Types 


There are two ways to dynamically instantiate an object from its type: 


e Call the static Activator .CreateInstance method 


e Call Invoke on a ConstructorInfo object obtained from calling Get 
Constructor on a Type (advanced scenarios) 


Activator .CreateInstance accepts a Type and optional arguments that it passes to 
the constructor: 


int i = (int) Activator.CreateInstance (typeof (int)); 


DateTime dt = (DateTime) Activator.CreateInstance (typeof (DateTime), 
2000, 1, 1); 
CreateInstance lets you specify many other options such as the assembly from 
which to load the type and whether to bind to a nonpublic constructor. A Missing 
MethodException is thrown if the runtime can’t find a suitable constructor. 


Calling Invoke on a ConstructorInfo is necessary when your argument values can't 
disambiguate between overloaded constructors. For example, suppose that class X 
has two constructors: one accepting a parameter of type string, and another 
accepting a parameter of type StringBuilder. The target is ambiguous should you 
pass a null argument into Activator.CreateInstance. This is when you need to 
use a ConstructorInfo instead: 


// Fetch the constructor that accepts a single parameter of type string: 
ConstructorInfo ci = typeof (X).GetConstructor (new[] { typeof (string) }); 


// Construct the object using that overload, passing in null: 
object foo = ci.Invoke (new object[] { null }); 


Or, if you're targeting .NET Core 1, an older Windows Store profile: 


ConstructorInfo ci = typeof (X).GetTypeInfo().DeclaredConstructors 
.FirstOrDefault (c => 
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c.GetParameters().Length == 1 && 
c.GetParameters()[0].ParameterType == typeof (string)); 


To obtain a nonpublic constructor, you need to specify BindingFlags—see “Access- 
ing Nonpublic Members” on page 813 in the later section “Reflecting and Invoking 
Members” on page 805. 


Dynamic instantiation adds a few microseconds onto the time 
taken to construct the object. This is quite a lot in relative 
terms because the CLR is ordinarily very fast in instantiating 
objects (a simple new on a small class takes in the region of 
tens of nanoseconds). 


To dynamically instantiate arrays based on just element type, first call MakeArray 
Type. You can also instantiate generic types: we describe this in the next section. 


To dynamically instantiate a delegate, call Delegate.CreateDelegate. The follow- 
ing example demonstrates instantiating both an instance delegate and a static 
delegate: 


class Program 


{ 
delegate int IntFunc (int x); 


static int Square (int x) => x * x; // Static method 
int Cube (int x) => x * x * x3 // Instance method 


static void Main() 


{ 
Delegate staticD = Delegate.CreateDelegate 


(typeof (IntFunc), typeof (Program), "Square"); 


Delegate instanceD = Delegate.CreateDelegate 
(typeof (IntFunc), new Program(), "Cube"); 


Console.WriteLine (staticD.DynamicInvoke (3)); // 9 
Console.WriteLine (instanceD.DynamicInvoke (3)); // 27 


} 
} 


You can invoke the Delegate object that’s returned by calling DynamicInvoke, as we 
did in this example, or by casting to the typed delegate: 


IntFunc f = (IntFunc) staticD; 
Console.WriteLine (f(3)); // 9 (but much faster!) 


You can pass a MethodInfo into CreateDelegate instead of a method name. We 
describe MethodInfo shortly, in “Reflecting and Invoking Members” on page 805, 
along with the rationale for casting a dynamically created delegate back to the static 
delegate type. 
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Generic Types 


A Type can represent a closed or unbound generic type. Just as at compile time, a 
closed generic type can be instantiated, whereas an unbound type cannot: 


Type closed = typeof (List<int>); 
List<int> list = (List<int>) Activator.CreateInstance (closed); // OK 


Type unbound = typeof (List<>); 
object anError = Activator.CreateInstance (unbound); // Runtime error 


The MakeGenericType method converts an unbound into a closed generic type. 
Simply pass in the desired type arguments: 


Type unbound = typeof (List<>); 
Type closed = unbound.MakeGenericType (typeof (int)); 


The GetGenericTypeDefinition method does the opposite: 
Type unbound2 = closed.GetGenericTypeDefinition(); // unbound == unbound2 


The IsGenericType property returns true if a Type is generic, and the IsGeneric 
TypeDefinition property returns true if the generic type is unbound. The follow- 
ing tests whether a type is a nullable value type: 


Type nullable = typeof (bool?); 
Console.WriteLine ( 
nullable.IsGenericType && 
nuLlable.GetGenericTypeDefinition() == typeof (Nullable<>));  // True 


GetGenericArguments returns the type arguments for closed generic types: 


Console.WriteLine (closed.GetGenericArguments()[0]); // System. Int32 
Console.WriteLine (nullable.GetGenericArguments()[0]); // System.Boolean 


For unbound generic types, GetGenericArguments returns pseudotypes that repre- 
sent the placeholder types specified in the generic type definition: 


Console.WriteLine (unbound.GetGenericArguments()[0]); // 7 


At runtime, all generic types are either unbound or closed. 
They're unbound in the (relatively unusual) case of an expres- 
sion such as typeof (Foo<>); otherwise, they’re closed. There’s 
no such thing as an open generic type at runtime: all open 
types are closed by the compiler. The method in the following 
class always prints False: 


class Foo<T> 
if 
public void Test() 
=> Console.Write (GetType().IsGenericTypeDefinition) ; 
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Reflecting and Invoking Members 


The GetMembers method returns the members of a type. Consider the following: 


class Walnut 
{ 
private bool cracked; 
public void Crack() { cracked = true; } 


} 


We can reflect on its public members, as follows: 


MemberInfo[] members = typeof (Walnut).GetMembers(); 
foreach (MemberInfo m in members) 
Console.WriteLine (m); 


This is the result: 


Void Crack() 

System.Type GetType() 
System.String ToString() 
Boolean Equals(System.Object) 
Int32 GetHashCode() 

Void .ctor() 





Reflecting Members with Typelnfo 


TypeInfo exposes a different (and somewhat simpler) protocol for reflecting over 
members. Using this API is optional in .NET Core 2+, but mandatory 
in .NET Core 1 and older Windows Store apps given that there’s no exact equivalent 
to the GetMembers method. 


Instead of exposing methods like GetMembers that return arrays, TypeInfo exposes 
properties that return IEnumerable<T>, upon which you typically run LINQ. The 
broadest is DeclaredMembers: 


TEnumerable<MemberInfo> members = 
typeof (Walnut) .GetTypeInfo() .DeclaredMembers; 
Unlike with GetMembers(), the result excludes inherited members: 


Void Crack() 
Void .ctor() 
Boolean cracked 


There are also properties for returning specific kinds of members (Declared 

Properties, DeclaredMethods, DeclaredEvents, and so on) and methods for return- 
ing a specific member by name (e.g., GetDeclaredMethod). The latter cannot be used 
on overloaded methods (because there's no way to specify parameter types). Instead, 
you run a LINQ query over DeclaredMethods: 


MethodInfo method = typeof (int).GetTypeInfo().DeclaredMethods 
.FirstOrDefault (m => m.Name == "ToString" && 
m.GetParameters().Length == 0); 
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When called with no arguments, GetMembers returns all the public members for a 
type (and its base types). GetMember retrieves a specific member by name—although 
it still returns an array because members can be overloaded: 


MemberInfo[] m = typeof (Walnut).GetMember ("Crack"); 
Console.WriteLine (m[0]); // Void Crack() 


MemberInfo also has a property called MemberType of type MemberTypes. This is a 
flags enum with these values: 


ALL Custom Field NestedType TypeInfo 
Constructor Event Method Property 


When calling GetMembers, you can pass in a MemberTypes instance to restrict the 
kinds of members that it returns. Alternatively, you can restrict the result set by call- 
ing GetMethods, GetFields, GetProperties, GetEvents, GetConstructors, or Get 
NestedTypes. There are also singular versions of each of these to hone in on a 
specific member. 


It pays to be as specific as possible when retrieving a type 
member so that your code doesnt break if additional members 
are added later. If you're retrieving a method by name, specify- 
ing all parameter types ensures that your code will still work if 
the method is later overloaded (we provide examples shortly, 
in “Method Parameters” on page 811). 


A Member Info object has a Name property and two Type properties: 


DeclaringType 
Returns the Type that defines the member 


ReflectedType 
Returns the Type upon which GetMembers was called 


The two differ when called on a member that’s defined in a base type: Declaring 
Type returns the base type, whereas ReflectedType returns the subtype. The follow- 
ing example highlights this: 


class Program 


{ 


static void Main() 


{ 


// MethodInfo is a subclass of MemberInfo; see Figure 19-1. 


MethodInfo test = typeof (Program).GetMethod ("ToString"); 
MethodInfo obj typeof (object) .GetMethod ("ToString"); 


Console.WriteLine (test.DeclaringType) ; // System.Object 
Console.WriteLine (obj.DeclaringType) ; // System.Object 
Console.WriteLine (test.ReflectedType) ; // Program 
Console.WriteLine (obj.ReflectedType) ; // System.Object 
Console.WriteLine (test == obj); // False 
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a 
} 


Because they have different ReflectedTypes, the test and obj objects are not equal. 
Their difference, however, is purely a fabrication of the reflection API; our Program 
type has no distinct ToString method in the underlying type system. We can verify 
that the two MethodInfo objects refer to the same method in either of two ways: 


Console.WriteLine (test.MethodHandle == obj.MethodHandle) ; // True 


Console.WriteLine (test.MetadataToken == obj.MetadataToken // True 
&& test.Module == obj.Module); 


A MethodHandle is unique to each (genuinely distinct) method within a process; a 
MetadataToken is unique across all types and members within an assembly module. 


MemberInfo also defines methods to return custom attributes (see “Retrieving 
Attributes at Runtime” on page 822). 


You can obtain the MethodBase of the currently executing 
method by calling MethodBase. GetCurrentMethod. 


Member Types 


MemberInfo itself is light on members because it’s an abstract base for the types 
shown in Figure 19-1. 









Eventinfo 
Methodinfo 


Fieldinfo Propertyinfo 


Constructorinfo, 





















Figure 19-1. Member types 


You can cast a MemberInfo to its subtype, based on its MemberType property. If you 
obtained a member via GetMethod, GetField, GetProperty, GetEvent, Get 
Constructor, or GetNestedType (or their plural versions), a cast isn’t necessary. 
Table 19-1 summarizes what methods to use for each kind of C# construct. 
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Table 19-1. Retrieving member metadata 


G construct | Method to use Name to use Result 

Method GetMethod (method name) MethodInfo 

Property GetProperty (property name) PropertyInfo 

Indexer GetDefaultMembers Member Info[ ] (containing 
PropertyInfo objects if compiled 
in G#) 

Field GetField (field name) FieldInfo 

Enum member GetField (member name) FieldInfo 

Event GetEvent (event name) EventInfo 

Constructor GetConstructor ConstructorInfo 

Finalizer GetMethod "Finalize" MethodInfo 

Operator GetMethod "op_" + operatorname MethodInfo 

Nestedtype  GetNestedType (type name) Type 





Each MemberInfo subclass has a wealth of properties and methods, exposing all 
aspects of the member’s metadata. This includes such things as visibility, modifiers, 
generic type arguments, parameters, return type, and custom attributes. 


Here is an example of using GetMethod: 


MethodInfo m = typeof (Walnut).GetMethod ("Crack"); 
Console.WriteLine (m); // Void Crack() 
Console.WriteLine (m.ReturnType); // System.Void 


All *Info instances are cached by the reflection API on first use: 


MethodInfo method = typeof (Walnut).GetMethod ("Crack"); 
MemberInfo member = typeof (Walnut).GetMember ("Crack") [0]; 


Console.Write (method == member); // True 


As well as preserving object identity, caching improves the performance of what is 
otherwise a fairly slow API. 


C# Members versus CLR Members 


The preceding table illustrates that some of C#’s functional constructs don't have a 
1:1 mapping with CLR constructs. This makes sense because the CLR and reflection 
API were designed with all .NET languages in mind—you can use reflection even 
from Visual Basic. 


Some C# constructs—namely indexers, enums, operators, and finalizers—are con- 
trivances as far as the CLR is concerned. Specifically: 
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e A C# indexer translates to a property accepting one or more arguments, 
marked as the type’s [DefaultMember ]. 


¢ A C# enum translates to a subtype of System. Enum with a static field for each 
member. 


¢ A C# operator translates to a specially named static method, starting in “op_”; 
for example, "op_Addition". 


¢ A C# finalizer translates to a method that overrides Finalize. 
Another complication is that properties and events actually comprise two things: 
¢ Metadata describing the property or event (encapsulated by PropertyInfo or 
EventInfo) 
e One or two backing methods 
In a C# program, the backing methods are encapsulated within the property or 
event definition. But when compiled to IL, the backing methods present as ordinary 


methods that you can call like any other. This means that GetMethods returns prop- 
erty and event backing methods alongside ordinary methods: 


class Test { public int X { get { return 0; } set {} } } 


void Demo() 


{ 
foreach (MethodInfo mi in typeof (Test).GetMethods()) 
Console.Write (mi.Name +" "); 
} 
// OUTPUT: 


get_X set_X GetType ToString Equals GetHashCode 


You can identify these methods through the IsSpecialName property in Method 
Info. IsSpecialName returns true for property, indexer, and event accessors, as well 
as operators. It returns false only for conventional C# methods—and the Finalize 
method if a finalizer is defined. 


Here are the backing methods that C# generates: 


C# construct Membertype Methods in IL 


Property Property get_XXXand set_XXX 
Indexer Property get_Itemandset_Item 


Event Event add_XXX and remove_XXX 





Each backing method has its own associated MethodInfo object. You can access 
these as follows: 


PropertyInfo pi = typeof (Console).GetProperty ("Title"); 
MethodInfo getter = pi.GetGetMethod() ; // get_Title 
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MethodInfo setter = pi.GetSetMethod() ; // set_Title 
MethodInfo[] both = pi.GetAccessors(); // Length==2 


GetAddMethod and GetRemoveMethod perform a similar job for EventInfo. 


To go in the reverse direction—from a MethodInfo to its associated PropertyInfo 
or EventInfo—you need to perform a query. LINQ is ideal for this job: 


PropertyInfo p = mi.DeclaringType.GetProperties() 
.First (x => x.GetAccessors (true).Contains (mi)); 


Generic Type Members 
You can obtain member metadata for both unbound and closed generic types: 


PropertyInfo unbound = typeof (IEnumerator<>) .GetProperty ("Current"); 
PropertyInfo closed = typeof (IEnumerator<int>).GetProperty ("Current"); 


Console.WriteLine (unbound); // T Current 
Console.WriteLine (closed); // Int32 Current 


Console.WriteLine (unbound.PropertyType.IsGenericParameter); // True 
Console.WriteLine (closed.PropertyType.IsGenericParameter); // False 


The MemberInfo objects returned from unbound and closed generic types are 
always distinct, even for members whose signatures don't feature generic type 
parameters: 


PropertyInfo unbound = typeof (List<>) .GetProperty ("Count"); 
PropertyInfo closed = typeof (List<int>).GetProperty ("Count"); 


Console.WriteLine (unbound); // Int32 Count 
Console.WriteLine (closed); // Int32 Count 


Console.WriteLine (unbound == closed); // False 


Console.WriteLine (unbound.DeclaringType.IsGenericTypeDefinition); // True 
Console.WriteLine (closed.DeclaringType.IsGenericTypeDefinition); // False 


Members of unbound generic types cannot be dynamically invoked. 


Dynamically Invoking a Member 


After you have a MethodInfo, PropertyInfo, or FieldInfo object, you can dynami- 
cally call it or get/set its value. This is called late binding because you choose which 
member to invoke at runtime rather than compile time. 


To illustrate, the following uses ordinary static binding: 


string s = "Hello"; 
int length = s.Length; 


Here's the same thing performed dynamically with late binding: 
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object s = "Hello"; 

PropertyInfo prop = s.GetType().GetProperty ("Length"); 

int length = (int) prop.GetValue (s, null); // 5 
GetValue and SetValue get and set the value of a PropertyInfo or FieldInfo. The 
first argument is the instance, which can be null for a static member. Accessing an 
indexer is just like accessing a property called Item, except that you provide indexer 
values as the second argument when calling GetValue or SetValue. 


To dynamically call a method, call Invoke on a MethodInfo, providing an array of 
arguments to pass to that method. If you get any of the argument types wrong, an 
exception is thrown at runtime. With dynamic invocation, you lose compile-time 
type safety, but you still have runtime type safety (just as with the dynamic 
keyword). 


Method Parameters 


Suppose that we want to dynamically call string’s Substring method. Statically, we 
would do this as follows: 


Console.WriteLine ("stamp".Substring(2)); // "amp" 


Here’s the dynamic equivalent with reflection and late binding: 


Type type = typeof (string); 
Type[] parameterTypes = { typeof (int) }; 
MethodInfo method = type.GetMethod ("Substring", parameterTypes) ; 


object[] arguments = { 2 }; 
object returnValue = method.Invoke ("stamp", arguments); 
Console.WriteLine (returnValue) ; // "amp" 


Because the Substring method is overloaded, we had to pass an array of parameter 
types to GetMethod to indicate which version we wanted. Without the parameter 
types, GetMethod would throw an AmbiguousMatchException. 


The GetParameters method, defined on MethodBase (the base class for MethodInfo 
and ConstructorInfo), returns parameter metadata. We can continue our previous 
example as follows: 


ParameterInfo[] paramList = method.GetParameters(); 
foreach (ParameterInfo x in paramList) 


{ 


Console.WriteLine (x.Name); // startIndex 
Console.WriteLine (x.ParameterType) ; // System. Int32 


} 
Dealing with ref and out parameters 


To pass ref or out parameters, call MakeByRefType on the type before obtaining the 
method. For instance, you can dynamically execute this code: 


int x; 
bool successfulParse = int.TryParse ("23", out x); 
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as follows: 


object[] args = { "23", 0 }; 

Type[] argTypes = { typeof (string), typeof (int).MakeByRefType() }; 
MethodInfo tryParse = typeof (int).GetMethod ("TryParse", argTypes); 
bool successfulParse = (bool) tryParse.Invoke (null, args); 


Console.WriteLine (successfulParse + " " + args[1]); // True 23 


This same approach works for both ref and out parameter types. 


Retrieving and invoking generic methods 


Explicitly specifying parameter types when calling GetMethod can be essential in 
disambiguating overloaded methods. However, it’s impossible to specify generic 
parameter types. For instance, consider the System.Linq.Enumerable class, which 
overloads the Where method, as follows: 


public static IEnumerable<TSource> Where<TSource> 
(this IEnumerable<TSource> source, Func<TSource, bool> predicate); 


public static IEnumerable<TSource> Where<TSource> 
(this IEnumerable<TSource> source, Func<TSource, int, bool> predicate); 


To retrieve a specific overload, we must retrieve all methods and then manually find 
the desired overload. The following query retrieves the former overload of Where: 


from m in typeof (Enumerable).GetMethods() 

where m.Name == "Where" && m.IsGenericMethod 

let parameters = m.GetParameters() 

where parameters.Length == 

let genArg = m.GetGenericArguments().First() 

let enumerableOfT = typeof (IEnumerable<>).MakeGenericType (genArg) 

let funcOfTBool = typeof (Func<,>).MakeGenericType (genArg, typeof (bool)) 


where parameters[0].ParameterType == enumerableOfT 
&& parameters[1].ParameterType == funcOfTBool 
select m 


Calling .Single() on this query gives the correct MethodInfo object with unbound 
type parameters. The next step is to close the type parameters by calling Make 
GenericMethod: 


var closedMethod = unboundMethod.MakeGenericMethod (typeof (int)); 


In this case, we've closed TSource with int, allowing us to call Enumerable.Where 
with a source of type IEnumerable<int> and a predicate of type Func<int,, bool>: 


int[] source = { 3, 4, 5, 6, 7, 8 }; 
Func<int, bool> predicate = n => n% 2 == 1; #// Odd numbers only 


We can now invoke the closed generic method: 


var query = (IEnumerable<int>) closedMethod. Invoke 
(null, new object[] { source, predicate }); 


foreach (int element in query) Console.Write (element + "|");  // 3|5]|7] 
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If you're using the System.Ling.Expressions API to dynami- 
cally build expressions (Chapter 8), you don't need to go 
to this trouble to specify a generic method. The 
Expression.Call method is overloaded to let you specify the 
closed type arguments of the method that you want to call: 


int[] source = { 3, 4, 5, 6, 7, 8 }; 
Func<int, bool> predicate = n => n % 2 == 1; 


var sourceExpr = Expression.Constant (source); 
var predicateExpr = Expression.Constant (predicate); 


var callExpression = Expression.Call ( 
typeof (Enumerable), "Where", 
new[] { typeof (int) }, // Closed generic arg type. 
sourceExpr, predicateExpr); 


Using Delegates for Performance 


Dynamic invocations are relatively inefficient, with an overhead typically in the few- 
microseconds region. If you're calling a method repeatedly in a loop, you can shift 
the per-call overhead into the nanoseconds region by instead calling a dynamically 
instantiated delegate that targets your dynamic method. In the following example, 
we dynamically call string’s Trim method a million times without significant 
overhead: 


delegate string StringToString (string s); 


static void Main() 


{ 
MethodInfo trimMethod = typeof (string).GetMethod ("Trim", new Type[0]); 


var trim = (StringToString) Delegate.CreateDelegate 
(typeof (StringToString), trimMethod); 
for (int i = 0; i < 1000000; i++) 
trim ("test"); 
} 


This is faster because the costly late binding (shown in bold) happens just once. 


Accessing Nonpublic Members 


All of the methods on types used to probe metadata (e.g., GetProperty, GetField, 
etc.) have overloads that take a BindingFlags enum. This enum serves as a meta- 
data filter and allows you to change the default selection criteria. The most common 
use for this is to retrieve nonpublic members (this works only in desktop apps). 


For instance, consider the following class: 


class Walnut 


{ 


private bool cracked; 
public void Crack() { cracked = true; } 


public override string ToString() { return cracked.ToString(); } 


} 
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We can uncrack the walnut as follows: 


Type t = typeof (Walnut); 

Walnut w = new Walnut(); 

w.Crack(); 

FieldInfo f = t.GetField ("cracked", BindingFlags.NonPublic | 
BindingFlags. Instance) ; 

f.SetValue (w, false); 

Console.WriteLine (w); // False 


Using reflection to access nonpublic members is powerful, but it is also dangerous 
because you can bypass encapsulation, creating an unmanageable dependency on 
the internal implementation of a type. 


The BindingFlags enum 


BindingFlags is intended to be bitwise-combined. To get any matches at all, you 
need to start with one of the following four combinations: 


BindingFlags. Public | BindingFlags. Instance 
BindingFlags. Public | BindingFlags.Static 
BindingFlags.NonPublic | BindingFlags.Instance 
BindingFlags.NonPublic | BindingFlags.Static 


NonPublic includes internal, protected, protected internal, and private. 
The following example retrieves all the public static members of type object: 


BindingFlags publicStatic = BindingFlags.Public | BindingFlags.Static; 
MemberInfo[] members = typeof (object).GetMembers (publicStatic); 


The following example retrieves all the nonpublic members of type object, both 
static and instance: 


BindingFlags nonPublicBinding = 
BindingFlags.NonPublic | BindingFlags.Static | BindingFlags.Instance; 


MemberInfo[] members = typeof (object).GetMembers (nonPublicBinding); 


The DeclaredOnly flag excludes functions inherited from base types, unless they are 
overridden. 


The DeclaredOnly flag is somewhat confusing in that it 
restricts the result set (whereas all the other binding flags 
expand the result set). 


Generic Methods 


You cannot directly invoke generic methods; the following throws an exception: 


class Program 


{ 
public static T Echo<T> (T x) { return x; } 


static void Main() 


{ 
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MethodInfo echo = typeof (Program).GetMethod ("Echo"); 
Console.WriteLine (echo. IsGenericMethodDefinition) ; // True 
echo.Invoke (null, new object[] { 123 } ); // Exception 


i 
} 


An extra step is required, which is to call MakeGenericMethod on the MethodInfo, 
specifying concrete generic type arguments. This returns another MethodInfo, 
which you can then invoke as follows: 


MethodInfo echo = typeof (Program).GetMethod ("Echo"); 

MethodInfo intEcho = echo.MakeGenericMethod (typeof (int)); 
Console.WriteLine (intEcho.IsGenericMethodDefinition) ; // False 
Console.WriteLine (intEcho.Invoke (null, new object[] { 3 } )); // 3 


Anonymously Calling Members of a Generic Interface 


Reflection is useful when you need to invoke a member of a generic interface and 
you don't know the type parameters until runtime. In theory, the need for this arises 
rarely if types are perfectly designed; of course, types are not always perfectly 
designed. 


For instance, suppose that we want to write a more powerful version of ToString 
that could expand the result of LINQ queries. We could start out as follows: 


public static string ToStringEx <T> (IEnumerable<T> sequence) 


{ 
oF 


This is already quite limiting. What if sequence contained nested collections that we 
also want to enumerate? Wed need to overload the method to cope: 


public static string ToStringEx <T> (IEnumerable<IEnumerable<T>> sequence) 


And then what if sequence contained groupings, or projections of nested sequences? 
The static solution of method overloading becomes impractical—we need an 
approach that can scale to handle an arbitrary object graph, such as the following: 


public static string ToStringEx (object value) 
{ 


if (value == null) return "<null>"; 
StringBuilder sb = new StringBuilder(); 


if (value is List<>) // Error 
sb.Append ("List of " + ((List<>) value).Count + " items"); // Error 


if (value is IGrouping<,>) // Error 
sb.Append ("Group with key=" + ((IGrouping<,>) value) .Key); // Error 


// Enumerate collection elements if this is a collection, 
// recursively calling ToStringEx() 


i 





Reflecting and Invoking Members | 815 


K< 
o 
or 
9 
ro¥ 
9 
aa 
9 


pue 
u01291J94 





return sb.ToString(); 
} 


Unfortunately, this won't compile: you cannot invoke members of an unbound 
generic type such as List<> or IGrouping<>. In the case of List<>, we can solve the 
problem by using the nongeneric IList interface instead: 


if (value is IList) 
sb.AppendLine ("A list with " + ((IList) value).Count + " items"); 


We can do this because the designers of List<> had the fore- 
sight to implement IList classic (as well as IList generic). 
The same principle is worthy of consideration when writing 
your own generic types: having a nongeneric interface or base 
class upon which consumers can fall back can be extremely 
valuable. 


The solution is not as simple for IGrouping<,>. Here’s how the interface is defined: 


public interface IGrouping <TKey,TElement> : IEnumerable <TElement>, 
TEnumerable 


{ 
TKey Key { get; } 
} 


There's no nongeneric type we can use to access the Key property, so here we must 
use reflection. The solution is not to invoke members of an unbound generic type 
(which is impossible), but to invoke members of a closed generic type, whose type 
arguments we establish at runtime. 


In the following chapter, we solve this more simply with C#’s 
dynamic keyword. A good indication for dynamic binding is 
when you would otherwise need to perform type gymnastics— 
as we are doing right now. 


The first step is to determine whether value implements IGrouping<,>, and if so, 
obtain its closed generic interface. We can do this most easily by executing a LINQ 
query. Then, we retrieve and invoke the Key property: 


public static string ToStringEx (object value) 
{ 


if (value == null) return "<null>"; 
if (value.GetType().IsPrimitive) return value. ToString(); 


StringBuilder sb = new StringBuilder(); 


if (value is IList) 
sb.Append ("List of " + ((IList)value).Count + " items: "); 


Type closedIGrouping = value.GetType().GetInterfaces() 
-Where (t => t.IsGenericType && 
t.GetGenericTypeDefinition() == typeof (IGrouping<,>)) 
-FirstOrDefault(); 
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if (closedIGrouping != null) // Call the Key property on IGrouping<,> 


{ 
PropertyInfo pi = closedIGrouping.GetProperty ("Key"); 


object key = pi.GetValue (value, null); 
sb.Append ("Group with key="_+ key + ": "); 
} 


if (value is IEnumerable) 
foreach (object element in ((IEnumerable)value) ) 
sb.Append (ToStringEx (element) + " "); 


if (sb.Length == 0) sb.Append (value.ToString()); 


return "\r\n" + sb.ToString(); 
} 


This approach is robust: it works whether IGrouping<,> is implemented implicitly 
or explicitly. The following demonstrates this method: 


Console.WriteLine (ToStringEx (new List<int> { 5, 6, 7 } )); 
Console.WriteLine (ToStringEx ("xyyzzz".GroupBy (c => c) )); 


List of 3 items: 567 


Group with key=x: x 
Group with key=y: y y 
Group with key=z: zzz 


Reflecting Assemblies 


You can dynamically reflect an assembly by calling GetType or GetTypes on an 
Assembly object. The following retrieves from the current assembly, the type called 
TestProgram in the Demos namespace: 


Type t = Assembly.GetExecutingAssembly().GetType ("Demos.TestProgram"); 
You can also obtain an assembly from an existing type: 

typeof (Foo).Assembly.GetType ("Demos.TestProgram"); 
The next example lists all the types in the assembly mylib.dll in e:\demo: 


Assembly a = Assembly.LoadFile (@"e:\demo\mylib.dll"); 


foreach (Type t in a.GetTypes()) 
Console.WriteLine (t); 


Or: 


Assembly a = typeof (Foo).GetTypeInfo().Assembly; 


foreach (Type t in a.ExportedTypes) 
Console.WriteLine (t); 


GetTypes and ExportedTypes return only top-level and not nested types. 
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Modules 


Calling GetTypes on a multimodule assembly returns all types in all modules. As a 
result, you can ignore the existence of modules and treat an assembly as a type’s 
container. There is one case, though, for which modules are relevant—and that’s 
when dealing with metadata tokens. 


A metadata token is an integer that uniquely refers to a type, member, string, or 
resource within the scope of a module. IL uses metadata tokens, so if you're parsing 
IL, you'll need to be able to resolve them. The methods for doing this are defined in 
the Module type and are called ResolveType, ResolveMember, ResolveString, and 
ResolveSignature. We revisit this in the final section of this chapter, on writing a 
disassembler. 


You can obtain a list of all the modules in an assembly by calling GetModules. You 
can also access an assembly’s main module directly via its ManifestModule property. 


Working with Attributes 


The CLR allows additional metadata to be attached to types, members, and assem- 
blies through attributes. This is the mechanism by which many CLR functions such 
as serialization and security are directed, making attributes an indivisible part of an 
application. 


A key characteristic of attributes is that you can write your own and then use them 
just as you would any other attribute to “decorate” a code element with additional 
information. This additional information is compiled into the underlying assembly 
and can be retrieved at runtime using reflection to build services that work declara- 
tively, such as automated unit testing. 


Attribute Basics 


There are three kinds of attributes: 


e Bit-mapped attributes 
e Custom attributes 


e Pseudocustom attributes 


Of these, only custom attributes are extensible. 


The term “attribute” by itself can refer to any of the three, 
although in the C# world, it most often refers to custom 
attributes or pseudocustom attributes. 


Bit-mapped attributes (our terminology) map to dedicated bits in a type’s metadata. 
Most of C#’s modifier keywords, such as public, abstract, and sealed, compile to 
bit-mapped attributes. These attributes are very efficient because they consume 
minimal space in the metadata (usually just one bit), and the CLR can locate them 
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with little or no indirection. The reflection API exposes them via dedicated proper- 
ties on Type (and other Member Info subclasses), such as IsPublic, IsAbstract, and 
IsSealed. The Attributes property returns a flags enum that describes most of 
them in one hit: 
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static void Main() 


{ 
TypeAttributes ta = typeof (Console) .Attributes; 
MethodAttributes ma = MethodInfo.GetCurrentMethod().Attributes; 
Console.WriteLine (ta + "\r\n" + ma); 


} 


Here’s the result: 


AutoLayout, AnsiClass, Class, Public, Abstract, Sealed, BeforeFieldInit 
PrivateScope, Private, Static, HideBySig 


In contrast, custom attributes compile to a blob that hangs off the type’s main meta- 
data table. All custom attributes are represented by a subclass of System.Attribute 
and, unlike bit-mapped attributes, are extensible. The blob in the metadata identi- 
fies the attribute class, and also stores the values of any positional or named argu- 
ment that was specified when the attribute was applied. Custom attributes that you 
define yourself are architecturally identical to those defined in .NET Core. 


Chapter 4 described how to attach custom attributes to a type or member in C#. 
Here, we attach the predefined Obsolete attribute to the Foo class: 


[Obsolete] public class Foo {...} 


This instructs the compiler to incorporate an instance of ObsoleteAttribute into 
the metadata for Foo, which then can be reflected at runtime by calling GetCustom 
Attributes on a Type or Member Info object. 


Pseudocustom attributes look and feel just like standard custom attributes. They are 
represented by a subclass of System.Attribute and are attached in the standard 
manner: 


[Serializable] public class Foo {...} 


The difference is that the compiler or CLR internally optimizes pseudocustom 
attributes by converting them to bit-mapped attributes. Examples include 
[Serializable] (Chapter 17), StructLayout, In, and Out (Chapter 25). Reflection 
exposes pseudocustom attributes through dedicated properties such as 
IsSerializable, and in many cases they are also returned as System.Attribute 
objects when you call GetCustomAttributes (SerializableAttribute included). 
This means that you can (almost) ignore the difference between pseudo- and non- 
pseudocustom attributes (a notable exception is when using Reflection.Emit to 
generate types dynamically at runtime; see “Emitting Assemblies and Types” on 
page 830). 
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The AttributeUsage Attribute 


AttributeUsage is an attribute applied to attribute classes. It instructs the compiler 
how the target attribute should be used: 


public sealed class AttributeUsageAttribute : Attribute 


{ 
public AttributeUsageAttribute (AttributeTargets validOn); 


public bool AllowMultiple { get; set; } 
public bool Inherited { get; set; } 
public AttributeTargets ValidOn { get; } 


} 


AllowMultiple controls whether the attribute being defined can be applied more 
than once to the same target; Inherited controls whether an attribute applied to a 
base class also applies to derived classes (or in the case of methods, whether an 
attribute applied to a virtual method also applies to overriding methods). ValidOn 
determines the set of targets (classes, interfaces, properties, methods, parameters, 
etc.) to which the attribute can be attached. It accepts any combination of values 
from the AttributeTargets enum, which has the following members: 


ALL Delegate GenericParameter Parameter 
Assembly Enum Interface Property 
Class Event Method ReturnVaLue 
Constructor Field Module Struct 


To illustrate, here’s how the authors of .NET Core have applied AttributeUsage to 
the Serializable attribute: 


[AttributeUsage (AttributeTargets.Delegate | 
AttributeTargets.Enum | 
AttributeTargets.Struct | 
AttributeTargets.Class, Inherited = false) 


public sealed class SerializableAttribute : Attribute { } 


This is, in fact, almost the complete definition of the Serializable attribute. Writ- 
ing an attribute class that has no properties or special constructors is this simple. 


Defining Your Own Attribute 
Here's how to write your own attribute: 
1. Derive a class from System.Attribute or a descendent of System.Attribute. 


By convention, the class name should end with the word “Attribute,” although 
this isn’t required. 


2. Apply the AttributeUsage attribute, described in the preceding section. 
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If the attribute requires no properties or arguments in its constructor, the job is 
done. 


3. Write one or more public constructors. The parameters to the constructor 
define the positional parameters of the attribute and will become mandatory 
when using the attribute. 


4. Declare a public field or property for each named parameter you wish to sup- 
port. Named parameters are optional when using the attribute. 


Attribute properties and constructor parameters must be of 
the following types: 


« A sealed primitive type: in other words, bool, byte, char, 
double, float, int, Long, short, or string 


The Type type 
e« Anenum type 


« A one-dimensional array of any of these 


When an attribute is applied, it must also be possible for the 
compiler to statically evaluate each of the properties or con- 
structor arguments. 


The following class defines an attribute for assisting an automated unit-testing sys- 
tem. It indicates that a method should be tested, the number of test repetitions, and 
a message in case of failure: 


[AttributeUsage (AttributeTargets.Method) ] 
public sealed class TestAttribute : Attribute 
{ 

public int Repetitions; 

public string FailureMessage; 


public TestAttribute () : this (1) {} 
public TestAttribute (int repetitions) { Repetitions = repetitions; } 


} 


Here’s a Foo class with methods decorated in various ways with the Test attribute: 


class Foo 
{ 
[Test] 
public void Method1() { ... } 


[Test(20) ] 
public void Method2() { ... } 


[Test(20, FailureMessage="Debugging Time!")] 
public void Method3() { ... } 
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Retrieving Attributes at Runtime 


There are two standard ways to retrieve attributes at runtime: 


¢ Call GetCustomAttributes on any Type or Member Info object 
e Call Attribute.GetCustomAttribute or Attribute. GetCustomAttributes 


These latter two methods are overloaded to accept any reflection object that corre- 
sponds to a valid attribute target (Type, Assembly, Module, MemberInfo, or 
ParameterInfo). 


You can also call GetCustomAttributesData() on a type or 
member to obtain attribute information. The difference 
between this and GetCustomAttributes() is that the former 
lets you know you how the attribute was instantiated: it 
reports the constructor overload that was used, and the value 
of each constructor argument and named parameter. This is 
useful when you want to emit code or IL to reconstruct the 
attribute to the same state (see “Emitting Type Members” on 
page 833). 


Here's how we can enumerate each method in the preceding Foo class that has a 
TestAttribute: 


foreach (MethodInfo mi in typeof (Foo) .GetMethods()) 


{ 
TestAttribute att = (TestAttribute) Attribute.GetCustomAttribute 
(mi, typeof (TestAttribute)); 


if (att != null) 
Console.WriteLine ("Method {0} will be tested; reps={1}; msg={2}", 
mi.Name, att.Repetitions, att.FailureMessage) ; 


Or: 


foreach (MethodInfo mi in typeof (Foo).GetTypeInfo().DeclaredMethods ) 


Here's the output: 


Method Method1 will be tested; reps=1; msg= 
Method Method2 will be tested; reps=20; msg= 
Method Method3 will be tested; reps=20; msg=Debugging Time! 


To complete the illustration on how we could use this to write a unit-testing system, 
here's the same example expanded so that it actually calls the methods decorated 
with the Test attribute: 


foreach (MethodInfo mi in typeof (Foo).GetMethods()) 
{ 
TestAttribute att = (TestAttribute) Attribute.GetCustomAttribute 
(mi, typeof (TestAttribute) ); 
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if (att != null) 





for (int i = 0; i < att.Repetitions; i++) = a 
2 
try o50 
{ 209 
o = 
mi.Invoke (new Foo(), null); // Call method with no arguments ai o 
} 
catch (Exception ex) // Wrap exception in att.FailureMessage 
{ 
throw new Exception ("Error: " + att.FailureMessage, ex); 
} 


} 


Returning to attribute reflection, here’s an example that lists the attributes present 
on a specific type: 


[Serializable, Obsolete] 
class Test 


{ 


static void Main() 


{ 
object[] atts = Attribute.GetCustomAttributes (typeof (Test)); 
foreach (object att in atts) Console.WriteLine (att); 


J 
} 


And, here's the output: 


System.ObsoleteAttribute 
System.SerializableAttribute 


Dynamic Code Generation 


The System.Reflection.Emit namespace contains classes for creating metadata 
and IL at runtime. Generating code dynamically is useful for certain kinds of pro- 
gramming tasks. An example is the regular expressions API, which emits perform- 
ant types tuned to specific regular expressions. Another example is Entity 
Framework Core, which uses Reflection.Emit to generate proxy classes to enable 
lazy loading. 


Generating IL with DynamicMethod 


The DynamicMethod class is a lightweight tool in the System.Reflection.Emit 
namespace for generating methods on the fly. Unlike TypeBuilder, it doesn’t 
require that you first set up a dynamic assembly, module, and type in which to con- 
tain the method. This makes it suitable for simple tasks—as well as serving as a 
good introduction to Reflection. Emit. 


A DynamicMethod and the associated IL are garbage-collected 
when no longer referenced. This means you can repeatedly 
generate dynamic methods without filling up memory. (To do 
the same with dynamic assemblies, you must apply the 
AssemblyBuilderAccess.RunAndCollect flag when creating 
the assembly.) 
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Here is a simple use of DynamicMethod to create a method that writes Hello world 
to the console: 


public class Test 


{ 


static void Main() 


{ 
var dynMeth = new DynamicMethod ("Foo", null, null, typeof (Test)); 


ILGenerator gen = dynMeth.GetILGenerator(); 

gen.EmitWriteLine ("Hello world"); 

gen.Emit (OpCodes.Ret); 

dynMeth. Invoke (null, null); // Hello world 


} 
} 


OpCodes has a static read-only field for every IL opcode. Most of the functionality is 
exposed through various opcodes, although ILGenerator also has specialized meth- 
ods for generating labels and local variables and for exception handling. A method 
always ends in Opcodes.Ret, which means “return,” or some kind of branching/ 
throwing instruction. The EmitWriteLine method on ILGenerator is a shortcut for 
Emitting a number of lower-level opcodes. We would get the same result if we 
replaced the call to EmitWriteLine with this: 


MethodInfo writeLineStr = typeof (Console).GetMethod ("WriteLine", 


new Type[] { typeof (string) }); 
gen.Emit (OpCodes.Ldstr, "Hello world"); // Load a string 
gen.Emit (OpCodes.Call, writeLineStr); // Call a method 


Note that we passed typeof (Test) into DynamicMethod’s constructor. This gives the 
dynamic method access to the nonpublic methods of that type, allowing us to do 
this: 


public class Test 


{ 
static void Main() 
{ 
var dynMeth = new DynamicMethod ("Foo", null, null, typeof (Test)); 
ILGenerator gen = dynMeth.GetILGenerator(); 
MethodInfo privateMethod = typeof(Test).GetMethod ("HelloWorld", 
BindingFlags.Static | BindingFlags.NonPublic); 
gen.Emit (OpCodes.Call, privateMethod) ; // Call HelloWorld 
gen.Emit (OpCodes.Ret); 
dynMeth. Invoke (null, null); // Hello world 
} 
static void HelloWorld() // private method, yet we can call it 
{ 
Console.WriteLine ("Hello world"); 
} 
} 
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Understanding IL requires a considerable investment of time. Rather than under- 
stand all the opcodes, it’s much easier to compile a C# program and then examine, 
copy, and tweak the IL. LINQPad displays the IL for any method or code snippet 
that you type, and assembly viewing tools such ILSpy are useful for examining exist- 
ing assemblies. 


The Evaluation Stack 


Central to IL is the concept of the evaluation stack. To call a method with argu- 
ments, you first push (load) the arguments onto the evaluation stack and then call 
the method. The method then pops the arguments it needs from the evaluation 
stack. We demonstrated this previously, in calling Console.WriteLine. Here's a sim- 
ilar example with an integer: 


var dynMeth = new DynamicMethod ("Foo", null, null, typeof(void)); 

ILGenerator gen = dynMeth.GetILGenerator(); 

MethodInfo writeLineInt = typeof (Console).GetMethod ("WriteLine", 
new Type[] { typeof (int) }); 


// The Ldc* op-codes load numeric literals of various types and sizes. 


gen.Emit (OpCodes.Ldc_I4, 123); // Push a 4-byte integer onto stack 
gen.Emit (OpCodes.Call, writeLineInt); 


gen.Emit (OpCodes.Ret); 
dynMeth. Invoke (null, null); // 123 


To add two numbers together, you first load each number onto the evaluation stack, 
and then call Add. The Add opcode pops two values from the evaluation stack and 
pushes the result back on. The following adds 2 and 2, and then writes the result 
using the writeLine method obtained previously: 


gen.Emit (OpCodes.Ldc_I4, 2); 

gen.Emit (OpCodes.Ldc_I4, 2); 

gen.Emit (OpCodes.Add); 

gen.Emit (OpCodes.Call, writeLineInt); 


// Push a 4-byte integer, value=2 
// Push a 4-byte integer, value=2 
// Add the result together 


To calculate 10 / 2 + 1, youcan do either this: 


gen.Emit (OpCodes.Ldc_I4, 10); 
gen.Emit (OpCodes.Ldc_I4, 2); 

gen.Emit (OpCodes.Div); 

gen.Emit (OpCodes.Ldc_I4, 1); 

gen.Emit (OpCodes.Add); 

gen.Emit (OpCodes.Call, writeLineInt); 


or this: 


gen.Emit (OpCodes.Ldc_I4, 1); 

gen.Emit (OpCodes.Ldc_I4, 10); 
gen.Emit (OpCodes.Ldc_I4, 2); 

gen.Emit (OpCodes.Div); 

gen.Emit (OpCodes.Add); 

gen.Emit (OpCodes.Call, writeLineInt); 
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Passing Arguments to a Dynamic Method 


The Ldarg and Ldarg_XxX opcodes load an argument passed into a method onto the 
stack. To return a value, leave exactly one value on the stack upon finishing. For this 
to work, you must specify the return type and argument types when calling Dynamic 
Method’s constructor. The following creates a dynamic method that returns the sum 
of two integers: 


DynamicMethod dynMeth = new DynamicMethod ("Foo", 
typeof (int), // Return type = int 
new[] { typeof (int), typeof (int) }, // Parameter types = int, int 
typeof (void)); 


ILGenerator gen = dynMeth.GetILGenerator(); 


gen.Emit (OpCodes.Ldarg_0); // Push first arg onto eval stack 
gen.Emit (OpCodes.Ldarg_1); // Push second arg onto eval stack 
gen.Emit (OpCodes.Add); // Add them together (result on stack) 
gen.Emit (OpCodes.Ret); // Return with stack having 1 value 


int result = (int) dynMeth.Invoke (null, new object[] { 3, 4}); //7 


When you exit, the evaluation stack must have exactly 0 or 1 
item (depending on whether your method returns a value). If 
you violate this, the CLR will refuse to execute your method. 
You can remove an item from the stack without processing it 
by emitting OpCodes . Pop. 


Rather than calling Invoke, it can be more convenient to work with a dynamic 
method as a typed delegate. The CreateDelegate method achieves just this. In our 
case, the delegate that we need has two integer parameters and an integer return 
type. We can use the Func<int, int, int> delegate for this purpose. The last line 
of our preceding example then becomes the following: 


var func = (Func<int,int,int>) dynMeth.CreateDelegate 
(typeof (Func<int,int,int>)); 
int result = func (3, 4); // 7 


A delegate also eliminates the overhead of dynamic method 
invocation—saving a few microseconds per call. 


We demonstrate how to pass by reference in “Emitting Type Members” on page 833. 


Generating Local Variables 


You can declare a local variable by calling DeclareLocal on an ILGenerator. This 
returns a LocalBuilder object, which you can use in conjunction with opcodes 
such as Ldloc (load a local variable) or Stloc (store a local variable). Ldloc pushes 
the evaluation stack; Stloc pops it. For example, consider the following C# code: 


int x = 6; 
int y = 73 
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xX = y; 
Console.WriteLine (x); 


The following generates the preceding code dynamically: 


=z % 
Oy = 
9 5 0 
a9092 
z (9 
9 s 


var dynMeth = new DynamicMethod ("Test", null, null, typeof (void)); 
ILGenerator gen = dynMeth.GetILGenerator(); 





LocalBuilder localX = gen.DeclareLocal (typeof (int)); // Declare x 
LocalBuilder localY = gen.DeclareLocal (typeof (int)); // Declare y 


gen.Emit (OpCodes.Ldc_I4, 6); // Push literal 6 onto eval stack 
gen.Emit (OpCodes.Stloc, localX); // Store in localX 
gen.Emit (OpCodes.Ldc_I4, 7); // Push literal 7 onto eval stack 


gen.Emit (OpCodes.Stloc, localY); // Store in localY 


gen.Emit (OpCodes.Ldloc, localX); // Push localX onto eval stack 
gen.Emit (OpCodes.Ldloc, localY); // Push localY onto eval stack 
gen.Emit (OpCodes.Mul); // Multiply values together 


gen.Emit (OpCodes.Stloc, localX); // Store the result to localX 


gen.EmitWriteLine (localX); // Write the value of localX 
gen.Emit (OpCodes.Ret); 


dynMeth. Invoke (null, null); // 42 


Branching 


In IL, there are no while, do, and for loops; it’s all done with labels and the equiva- 
lent of goto and conditional goto statements. These are the branching opcodes, 
such as Br (branch unconditionally), Brtrue (branch if the value on the evaluation 
stack is true), and BUt (branch if the first value is less than the second value). 


To set a branch target, first call DefineLabel (this returns a Label object), and then 
call MarkLabel at the place where you want to anchor the label. For example, con- 
sider the following C# code: 


int x = 5; 
while (x <= 10) Console.WriteLine (x++); 


We can emit this as follows: 


ILGenerator gen =... 


Label startLoop = gen.DefineLabel(); // Declare labels 
Label endLoop = gen.DefineLabel(); 


LocalBuilder x = gen.DeclareLocal (typeof (int)); // int x 

gen.Emit (OpCodes.Ldc_I4, 5); // 

gen.Emit (OpCodes.Stloc, x); //x=5 

gen.MarkLabel (startLoop); 
gen.Emit (OpCodes.Ldc_I4, 10); // Load 10 onto eval stack 
gen.Emit (OpCodes.Ldloc, x); // Load x onto eval stack 
gen.Emit (OpCodes.Blt, endLoop); // if (x > 10) goto endLoop 
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gen.EmitWriteLine (x); // Console.WriteLine (x) 


gen.Emit (OpCodes.Ldloc, x); // Load x onto eval stack 
gen.Emit (OpCodes.Ldc_I4, 1); // Load 1 onto the stack 
gen.Emit (OpCodes.Add); // Add them together 
gen.Emit (OpCodes.Stloc, x); // Save result back to x 
gen.Emit (OpCodes.Br, startLoop); // return to start of loop 


gen.MarkLabel (endLoop); 


gen.Emit (OpCodes.Ret); 


Instantiating Objects and Calling Instance Methods 


The IL equivalent of new is the Newobj opcode. This takes a constructor and loads 
the constructed object onto the evaluation stack. For instance, the following con- 
structs a StringBuilder: 


var dynMeth = new DynamicMethod ("Test", null, null, typeof (void)); 
ILGenerator gen = dynMeth.GetILGenerator(); 


ConstructorInfo ci = typeof (StringBuilder).GetConstructor (new Type[0]); 
gen.Emit (OpCodes.Newobj, ci); 


After loading an object onto the evaluation stack, you can use the Call or Callvirt 
opcode to invoke the object’s instance methods. Extending this example, we'll query 
the StringBuilder’s MaxCapacity property by calling the property’s get accessor 
and then write out the result: 


gen.Emit (OpCodes.Callvirt, typeof (StringBuilder) 
.GetProperty ("MaxCapacity").GetGetMethod()); 


gen.Emit (OpCodes.Call, typeof (Console).GetMethod ("WriteLine", 


new[] { typeof (int) } )); 
gen.Emit (OpCodes.Ret); 
dynMeth. Invoke (null, null); // 2147483647 


To emulate C# calling semantics: 


e Use Call to invoke static methods and value type instance methods. 


e Use Callvirt to invoke reference type instance methods (whether or not 
they're declared virtual). 


In our example, we used Callvirt on the StringBuilder instance—even though 
MaxProperty is not virtual. This doesn’t cause an error: it simply performs a nonvir- 
tual call instead. Always invoking reference type instance methods with Callvirt 
avoids risking the opposite condition: invoking a virtual method with Call. (The 
risk is real. The author of the target method may later change its declaration.) 
Callvirt also has the benefit of checking that the receiver is non-null. 
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Invoking a virtual method with Call bypasses virtual calling 
semantics, and calls that method directly. This is rarely desira- 
ble and, in effect, violates type safety. 


In the following example, we construct a StringBuilder passing in two arguments, 
append", world!" to the StringBuilder, and then call ToString on it: 


// We will call: new StringBuilder ("Hello", 1000) 


ConstructorInfo ci = typeof (StringBuilder).GetConstructor ( 
new[] { typeof (string), typeof (int) } ); 


gen.Emit (OpCodes.Ldstr, "Hello"); // Load a string onto the eval stack 
gen.Emit (OpCodes.Ldc_I4, 1000); // Load an int onto the eval stack 
gen.Emit (OpCodes.Newobj, ci); // Construct the StringBuilder 


Type[] strT = { typeof (string) }; 

gen.Emit (OpCodes.Ldstr, ", world!"); 

gen.Emit (OpCodes.Call, typeof (StringBuilder).GetMethod ("Append", strT)); 
gen.Emit (OpCodes.Callvirt, typeof (object).GetMethod ("ToString")); 
gen.Emit (OpCodes.Call, typeof (Console).GetMethod ("WriteLine", strT)); 
gen.Emit (OpCodes.Ret); 

dynMeth. Invoke (null, null); // Hello, world! 


For fun we called GetMethod on typeof(object), and then used Callvirt to per- 
form a virtual method call on ToString. We could have gotten the same result by 
calling ToString on the StringBuilder type itself: 
gen.Emit (OpCodes.Callvirt, typeof (StringBuilder).GetMethod ("ToString", 
new Type[®] )); 
(The empty type array is required in calling GetMethod because StringBuilder 
overloads ToString with another signature.) 


Had we called object’s ToString method nonvirtually: 


gen.Emit (OpCodes.Call, 
typeof (object).GetMethod ("ToString")); 


the result would have been System. Text.StringBuilder. In 
other words, we would have circumvented StringBuilder’s 
ToString override and called object’s version directly. 


Exception Handling 


ILGenerator provides dedicated methods for exception handling. Thus, the transla- 
tion for this C# code: 


try { throw new NotSupportedException(); } 

catch (NotSupportedException ex) { Console.WriteLine (ex.Message); } 

finally { Console.WriteLine ("Finally"); } 
is this: 


MethodInfo getMessageProp = typeof (NotSupportedException) 
.GetProperty ("Message") .GetGetMethod() ; 
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MethodInfo writeLineString = typeof (Console).GetMethod ("WriteLine", 
new[] { typeof (object) } ); 
gen.BeginExceptionBlock(); 
ConstructorInfo ci = typeof (NotSupportedException).GetConstructor ( 
new Type[0] ); 
gen.Emit (OpCodes.Newobj, ci); 
gen.Emit (OpCodes.Throw); 
gen.BeginCatchBlock (typeof (NotSupportedException) ); 
gen.Emit (OpCodes.Callvirt, getMessageProp) ; 
gen.Emit (OpCodes.Call, writeLineString); 
gen.BeginFinallyBlock(); 
gen.EmitWriteLine ("Finally"); 
gen.EndExceptionBlock(); 


Just as in C#, you can include multiple catch blocks. To rethrow the same excep- 
tion, emit the Rethrow opcode. 


ILGenerator provides a helper method called Throw 
Exception. This contains a bug, however, preventing it from 
being used with a DynamicMethod. It works only with a Method 
Builder (see the next section). 


Emitting Assemblies and Types 


Although DynamicMethod is convenient, it can generate only methods. If you need 
to emit any other construct—or a complete type—you need to use the full “heavy- 
weight” API. This means dynamically building an assembly and module. The 
assembly need not have a disk presence (in fact it cannot, because .NET Core 3 does 
not let you save generated assemblies to disk). 


Let’s assume that we want to dynamically build a type. Because a type must reside in 
a module within an assembly, we first must create the assembly and module before 
we can create the type. This is the job of the AssemblyBuilder and ModuleBuilder 


types: 


AssemblyName aname = new AssemblyName ("MyDynamicAssembly") ; 


AssemblyBuilder assemBuilder = 
AssemblyBuilder .DefineDynamicAssembly (aname, AssemblyBuilderAccess.Run) ; 


ModuleBuilder modBuilder = assemBuilder.DefineDynamicModule ("DynModule"); 


You can't add a type to an existing assembly, because an 
assembly is immutable after it’s created. 


Dynamic assemblies are not garbage-collected and remain in 
memory until the process ends, unless you specify Assembly 
BuilderAccess.RunAndCollect when defining the assembly. 
Various restrictions apply to collectible assemblies (see http:// 
albahari.com/dynamiccollect). 
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After we have a module in which the type can reside, we can use TypeBuilder to 
create the type. The following defines a class called Widget: 


TypeBuilder tb = modBuilder.DefineType ("Widget", TypeAttributes.Public); 


The TypeAttributes flags enum supports the CLR type modifiers you see when dis- 
assembling a type with ildasm. As well as member visibility flags, this includes type 
modifiers such as Abstract and Sealed—and Interface for defining a .NET inter- 
face. It also includes Serializable, which is equivalent to applying the 
[Serializable] attribute in C#, and Explicit, which is equivalent to applying 
[StructLayout(LayoutKind. Explicit) ]. We describe how to apply other kinds of 
attributes later in this chapter, in “Attaching Attributes” on page 838. 


The DefineType method also accepts an optional base type: 


¢ To define a struct, specify a base type of System.Value 
Type. 


e To define a delegate, specify a base type of System 
.MulticastDelegate. 


e To implement an interface, use the constructor that 
accepts an array of interface types. 


e To define an interface, specify TypeAttributes 
.Interface | TypeAttributes.Abstract. 


Defining a delegate type requires a number of extra steps. In 
his weblog, Joel Pobar demonstrates how this is done in his 
article titled “Creating delegate types via Reflection.Emit.” 


We can now create members within the type: 


MethodBuilder methBuilder = tb.DefineMethod ("SayHello", 
MethodAttributes.Public, 
null, null); 

ILGenerator gen = methBuilder.GetILGenerator(); 

gen.EmitWriteLine ("Hello world"); 

gen.Emit (OpCodes.Ret); 


We're now ready to create the type, which finalizes its definition: 
Type t = tb.CreateType(); 


After the type is created, we can use ordinary reflection to inspect and perform late 
binding: 


object o = Activator.CreateInstance (t); 
t.GetMethod ("SayHello").Invoke (0, null); // Hello world 


The Reflection.Emit Object Model 


Figure 19-2 illustrates the essential types in System.Reflection.Emit. Each type 
describes a CLR construct and is based on a counterpart in the System.Ref lection 
namespace. This allows you to use emitted constructs in place of normal constructs 
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when building a type. For example, we previously called Console.WriteLine as 
follows: 


MethodInfo writeLine = typeof(Console).GetMethod ("WriteLine", 
new Type[] { typeof (string) }); 
gen.Emit (OpCodes.Call, writeLine); 
We could just as easily call a dynamically generated method by calling gen. Emit 
with a MethodBuilder instead of a MethodInfo. This is essential—otherwise, you 
couldn't write one dynamic method that called another in the same type. 
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Figure 19-2. System.Reflection. Emit 


Recall that you must call CreateType on a TypeBuilder when you've finished popu- 
lating it. Calling CreateType seals the TypeBuilder and all its members—so nothing 
more can be added or changed—and gives you back a real Type that you can 
instantiate. 


Before you call CreateType, the TypeBuilder and its members are in an uncreated 
state. There are significant restrictions on what you can do with uncreated con- 
structs. In particular, you cannot call any of the members that return MemberInfo 
objects, such as GetMembers, GetMethod, or GetProperty—these all throw an excep- 
tion. If you want to refer to members of an uncreated type, you must use the origi- 
nal emissions: 


TypeBuilder tb =... 


MethodBuilder method1 = tb.DefineMethod ("Method1i", ...); 
MethodBuilder method2 = tb.DefineMethod ("Method2", ...); 


ILGenerator geni = method1.GetILGenerator(); 
// Suppose we want method1 to call method2: 


geni.Emit (OpCodes.Call, method2); // Right 
gen1.Emit (OpCodes.Call, tb.GetMethod ("Method2"));  // Wrong 
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After calling CreateType, you can reflect on and activate not only the Type 
returned, but also the original TypeBuilder object. The TypeBuilder, in fact, 
morphs into a proxy for the real Type. You'll see why this feature is important in 
“Awkward Emission Targets” on page 840. 


Emitting Type Members 


All the examples in this section assume a TypeBuilder, tb, has been instantiated as 
follows: 


AssemblyName aname = new AssemblyName ("MyEmissions"); 


AssemblyBuilder assemBuilder = AssemblyBuilder .DefineDynamicAssembly ( 
aname, AssemblyBuilderAccess.Run); 


ModuleBuilder modBuilder = assemBuilder.DefineDynamicModule ("MainModule") ; 


TypeBuilder tb = modBuilder.DefineType ("Widget", TypeAttributes.Public); 


Emitting Methods 


You can specify a return type and parameter types when calling DefineMethod, in 
the same manner as when instantiating a DynamicMethod. For instance, the follow- 
ing method: 


public static double SquareRoot (double value) => Math.Sqrt (value); 
can be generated like this: 


MethodBuilder mb = tb.DefineMethod ("SquareRoot", 
MethodAttributes.Static | MethodAttributes.Public, 
CallingConventions. Standard, 
typeof (double), // Return type 
new[] { typeof (double) } ); // Parameter types 


mb.DefineParameter (1, ParameterAttributes.None, "value"); // Assign name 


ILGenerator gen = mb.GetILGenerator(); 

gen.Emit (OpCodes.Ldarg_0); // Load 1st arg 
gen.Emit (OpCodes.Call, typeof(Math).GetMethod ("Sqrt")); 

gen.Emit (OpCodes.Ret); 


Type realType = tb.CreateType(); 
double x = (double) tb.GetMethod ("SquareRoot").Invoke (null, 

new object[] { 10.0 }); 
Console.WriteLine (x); // 3.16227766016838 


Calling DefineParameter is optional and is typically done to assign the parameter a 
name. The number 1 refers to the first parameter (0 refers to the return value). If 
you call DefineParameter, the parameter is implicitly named __p1, __p2, and so on. 
Assigning names makes sense if you will write the assembly to disk; it makes your 
methods friendly to consumers. 
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DefineParameter returns a ParameterBuilder object upon 
which you can call SetCustomAttribute to attach attributes 
(see “Attaching Attributes” on page 838). 


To emit pass-by-reference parameters, such as in the following C# method: 


public static void SquareRoot (ref double value) 
=> value = Math.Sqrt (value); 


call MakeByRefType on the parameter type(s): 


MethodBuilder mb = tb.DefineMethod ("SquareRoot", 
MethodAttributes.Static | MethodAttributes.Public, 
CallingConventions. Standard, 
null, 
new Type[] { typeof (double) .MakeByRefType() } ); 


mb.DefineParameter (1, ParameterAttributes.None, "value"); 


ILGenerator gen = mb.GetILGenerator(); 

gen.Emit (OpCodes.Ldarg_0); 

gen.Emit (OpCodes.Ldarg_0); 

gen.Emit (OpCodes.Ldind_R8); 

gen.Emit (OpCodes.Call, typeof (Math).GetMethod ("Sqrt")); 
gen.Emit (OpCodes.Stind_R8); 

gen.Emit (OpCodes.Ret); 


Type realType = tb.CreateType(); 

object[] args = { 10.0 }; 

tb.GetMethod ("SquareRoot").Invoke (null, args); 

Console.WriteLine (args[0]); // 3.16227766016838 


The opcodes here were copied from a disassembled C# method. Notice the differ- 
ence in semantics for accessing parameters passed by reference: Ldind and Stind 
mean “load indirectly” and “store indirectly,’ respectively. The R8 suffix means an 
eight-byte floating-point number. 


The process for emitting out parameters is identical, except that you call Define 
Parameter as follows: 


mb.DefineParameter (1, ParameterAttributes.Out, "value"); 


Generating instance methods 


To generate an instance method, specify MethodAttributes. Instance when calling 
DefineMethod: 


MethodBuilder mb = tb.DefineMethod ("SquareRoot", 
MethodAttributes.Instance | MethodAttributes.Public 


With instance methods, argument zero is implicitly this; the remaining arguments 
start at 1. So, Ldarg_0 loads this onto the evaluation stack; Ldarg_1 loads the first 
real method argument. 
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Overriding methods 


Overriding a virtual method in a base class is easy: simply define a method with an 
identical name, signature, and return type, specifying MethodAttributes. Virtual 
when calling DefineMethod. The same applies when implementing interface 
methods. 


TypeBuilder also exposes a method called DefineMethodOverride, which overrides 
a method with a different name. This makes sense only with explicit interface 
implementation; in other scenarios, use DefineMethod. 


HideBySig 


If you're subclassing another type, it’s nearly always worth specifying Method 
Attributes .HideBySig when defining methods. HideBySig ensures that C#-style 
method hiding semantics are applied, which is that a base method is hidden only if a 
subtype defines a method with an identical signature. Without HideBySig, method 
hiding considers only the name, so Foo(string) in the subtype will hide Foo() in 
the base type, which is generally undesirable. 


Emitting Fields and Properties 


To create a field, you call DefineField on a TypeBuilder, specifying the desired 
field name, type, and visibility. The following creates a private integer field called 
length: 


FieldBuilder field = tb.DefineField ("length", typeof (int), 
FieldAttributes.Private) ; 


Creating a property or indexer requires a few more steps. First, call DefineProperty 
on a TypeBuilder, providing it with the name and type of the property: 


PropertyBuilder prop = tb.DefineProperty ( 


"Text", // Name of property 
PropertyAttributes.None, 

typeof (string), // Property type 
new Type[0] // Indexer types 


); 


(If you're writing an indexer, the final argument is an array of indexer types.) Note 
that we haven't specified the property visibility: this is done individually on the 
accessor methods. 


The next step is to write the get and set methods. By convention, their names are 
prefixed with “get_” or “set_” You then attach them to the property by calling 
SetGetMethod and SetSetMethod on the PropertyBuilder. 


To give a complete example, let’s take the following field and property declaration: 


string _text; 
public string Text 


{ 
get => _text; 





Emitting Type Members | 835 


K< 
© 
taal 
» 
ro¥ 
9 
aa 
9 


pue 
u01291J94 





internal set => _text = value; 


} 


and generate it dynamically: 


FieldBuilder field = tb.DefineField ("_text", typeof (string), 
FieldAttributes.Private) ; 
PropertyBuilder prop = tb.DefineProperty ( 


"Text", // Name of property 
PropertyAttributes.None, 

typeof (string), // Property type 
new Type[0]); // Indexer types 


MethodBuilder getter = tb.DefineMethod ( 


"get_Text", // Method name 
MethodAttributes.Public | MethodAttributes.SpecialName, 

typeof (string), // Return type 

new Type[0]); // Parameter types 


ILGenerator getGen = getter.GetILGenerator(); 


getGen.Emit (OpCodes.Ldarg_0); // Load "this" onto eval stack 
getGen.Emit (OpCodes.Ldfld, field); // Load field value onto eval stack 
getGen.Emit (OpCodes.Ret); // Return 


MethodBuilder setter = tb.DefineMethod ( 


"set_Text", 

MethodAttributes.Assembly | MethodAttributes.SpecialName, 

null, // Return type 

new Type[] { typeof (string) } ); // Parameter types 


ILGenerator setGen = setter.GetILGenerator(); 


setGen.Emit (OpCodes.Ldarg_0); // Load "this" onto eval stack 
setGen.Emit (OpCodes.Ldarg_1); // Load 2nd arg, i.e., value 
setGen.Emit (OpCodes.Stfld, field); // Store value into field 
setGen.Emit (OpCodes.Ret); // return 

prop.SetGetMethod (getter); // Link the get method and property 
prop.SetSetMethod (setter); // Link the set method and property 


We can test the property as follows: 


Type t = tb.CreateType(); 

object o = Activator.CreateInstance (t); 

t.GetProperty ("Text").SetValue (0, "Good emissions!", new object[0]); 
string text = (string) t.GetProperty ("Text").GetValue (0, null); 


Console.WriteLine (text); // Good emissions! 


Notice that in defining the accessor MethodAttributes, we included SpecialName. 
This instructs compilers to disallow direct binding to these methods when statically 
referencing the assembly. It also ensures that the accessors are handled appropri- 
ately by reflection tools and Visual Studio's IntelliSense. 
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You can emit events in a similar manner, by calling Define 
Event on a TypeBuilder. You then write explicit event acces- 
sor methods and attach them to the EventBuilder by calling 
SetAddOnMethod and SetRemoveOnMethod. 


Emitting Constructors 


You can define your own constructors by calling DefineConstructor on a type 
builder. You're not obliged to do so—a default parameterless constructor is auto- 
matically provided if you don’t. The default constructor calls the base class con- 
structor if subtyping, just like in C#. Defining one or more constructors displaces 
this default constructor. 


If you need to initialize fields, the constructor’s a good spot. In fact, it’s the only 
spot: C#’s field initializers don’t have special CLR support—they are simply a syntac- 
tic shortcut for assigning values to fields in the constructor. 


So, to reproduce this: 


class Widget 


{ 
int _capacity = 4000; 
} 


you would define a constructor as follows: 


FieldBuilder field = tb.DefineField ("_capacity", typeof (int), 
FieldAttributes.Private); 
ConstructorBuilder c = tb.DefineConstructor ( 
MethodAttributes.Public, 
CallingConventions.Standard, 
new Type[0]); // Constructor parameters 


ILGenerator gen = c.GetILGenerator(); 


gen.Emit (OpCodes.Ldarg_0); // Load "this" onto eval stack 
gen.Emit (OpCodes.Ldc_14, 4000); // Load 4000 onto eval stack 
gen.Emit (OpCodes.Stfld, field); // Store it to our field 
gen.Emit (OpCodes.Ret); 


Calling base constructors 


If subclassing another type, the constructor we just wrote would circumvent the base 
class constructor. This is unlike C#, in which the base class constructor is always 
called, whether directly or indirectly. For instance, given the following code: 


class A { public A() { Console.Write ("A"); } } 
class B : A { public B() {} } 


the compiler, in effect, will translate the second line into this: 


class B : A { public B() : base() {} } 
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This is not the case when generating IL: you must explicitly call the base constructor 
if you want it to execute (which nearly always, you do). Assuming the base class is 
called A, here’s how to do it: 


gen.Emit (OpCodes.Ldarg_0); 
ConstructorInfo baseConstr = typeof (A).GetConstructor (new Type[0]); 
gen.Emit (OpCodes.Call, baseConstr); 


Calling constructors with arguments is just the same as with methods. 


Attaching Attributes 


You can attach custom attributes to a dynamic construct by calling SetCustom 
Attribute with a CustomAttributeBuilder. For example, suppose that we want to 
attach the following attribute declaration to a field or property: 


[XmlElement ("FirstName", Namespace="http://test/", Order=3)] 


This relies on the XmlELementAttribute constructor that accepts a single string. To 
use CustomAttributeBuilder, we must retrieve this constructor as well as the two 
additional properties that we want to set (Namespace and Order): 


Type attType = typeof (XmlELlementAttribute) ; 


ConstructorInfo attConstructor = attType.GetConstructor ( 
new Type[] { typeof (string) } ); 


var att = new CustomAttributeBuilder ( 


attConstructor, // Constructor 

new object[] { "FirstName" }, // Constructor arguments 
new PropertyInfo[ ] 

{ 


attType.GetProperty ("Namespace"), // Properties 
attType.GetProperty ("Order") 


}, 
new object[] { "http://test/", 3 } // Property values 


); 


myFieldBuilder.SetCustomAttribute (att); 
// or propBuilder.SetCustomAttribute (att); 
// or typeBuilder.SetCustomAttribute (att); etc 


Emitting Generic Methods and Types 


All the examples in this section assume that modBuilder has been instantiated as 
follows: 


AssemblyName aname = new AssemblyName ("MyEmissions"); 


AssemblyBuilder assemBuilder = AssemblyBuilder .DefineDynamicAssembly ( 
aname, AssemblyBuilderAccess.Run); 


ModuleBuilder modBuilder = assemBuilder.DefineDynamicModule ("MainModule") ; 
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Defining Generic Methods 
Follow these steps to emit a generic method: 
1. Call DefineGenericParameters on a MethodBuilder to obtain an array of 
GenericTypeParameterBuilder objects. 
2. Call SetSignature on a MethodBuilder using these generic type parameters. 


3. Optionally, name the parameters as you would otherwise. 


For example, the following generic method: 


public static T Echo<T> (T value) 
{ 


return value; 


} 


can be emitted like this: 


TypeBuilder tb = modBuilder.DefineType ("Widget", TypeAttributes.Public) ; 


MethodBuilder mb = tb.DefineMethod ("Echo", MethodAttributes.Public | 
MethodAttributes.Static); 
GenericTypeParameterBuilder[] genericParams 
= mb.DefineGenericParameters ("T"); 


mb.SetSignature (genericParams[0], // Return type 
null, null, 
genericParams, // Parameter types 


null, null); 
mb.DefineParameter (1, ParameterAttributes.None, "value"); // Optional 


ILGenerator gen = mb.GetILGenerator(); 
gen.Emit (OpCodes.Ldarg_0); 
gen.Emit (OpCodes.Ret); 


The DefineGenericParameters method accepts any number of string arguments— 
these correspond to the desired generic type names. In this example, we needed just 
one generic type called T. GenericTypeParameterBuilder is based on System. Type, 
so you can use it in place of a TypeBuilder when emitting opcodes. 


GenericTypeParameterBuilder also lets you specify a base type constraint: 
genericParams[0].SetBaseTypeConstraint (typeof (Foo)); 
and interface constraints: 
genericParams[0].SetInterfaceConstraints (typeof (IComparable)); 
To replicate this: 
public static T Echo<T> (T value) where T : IComparable<T> 


you would write: 
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genericParams[0].SetInterfaceConstraints ( 
typeof (IComparable<>).MakeGenericType (genericParams[0]) ); 


For other kinds of constraints, call SetGenericParameterAttributes. This accepts 
a member of the GenericParameterAttributes enum, which includes the following 
values: 


DefaultConstructorConstraint 
NotNullableValueTypeConstraint 
ReferenceTypeConstraint 
Covariant 

Contravariant 


The last two are equivalent to applying the out and in modifiers to the type 
parameters. 


Defining Generic Types 


You can define generic types in a similar fashion. The difference is that you call 
DefineGenericParameters on the TypeBuilder rather than the MethodBuilder. So, 
to reproduce this: 


public class Widget<T> 


{ 
public T Value; 


} 


you would do the following: 


TypeBuilder tb = modBuilder.DefineType ("Widget", TypeAttributes.Public) ; 


GenericTypeParameterBuilder[] genericParams 
= tb.DefineGenericParameters ("T"); 


tb.DefineField ("Value", genericParams[0], FieldAttributes.Public); 


Generic constraints can be added, just as with a method. 


Awkward Emission Targets 


All of the examples in this section assume that a modBuilder has been instantiated 
as in previous sections. 


Uncreated Closed Generics 


Suppose that you want to emit a method that uses a closed generic type: 


public class Widget 
{ 


public static void Test() { var list = new List<int>(); } 


} 


The process is fairly straightforward: 
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TypeBuilder tb = modBuilder.DefineType ("Widget", TypeAttributes.Public); 


MethodBuilder mb = tb.DefineMethod ("Test", MethodAttributes.Public | 
MethodAttributes.Static); 
ILGenerator gen = mb.GetILGenerator(); 


Type variableType = typeof (List<int>); 
ConstructorInfo ci = variableType.GetConstructor (new Type[0]); 


LocalBuilder listVar = gen.DeclareLocal (variableType) ; 
gen.Emit (OpCodes.Newobj, ci); 

gen.Emit (OpCodes.Stloc, listVar); 

gen.Emit (OpCodes.Ret); 


Now suppose that instead of a list of integers, we want a list of widgets: 


public class Widget 


{ 
public static void Test() { var list = new List<Widget>(); } 


} 


In theory, this is a simple modification; all we do is replace this line: 
Type variableType = typeof (List<int>); 

with this one: 
Type variableType = typeof (List<>).MakeGenericType (tb); 


Unfortunately, this causes a NotSupportedException to be thrown when we then 
call GetConstructor. The problem is that you cannot call GetConstructor on a 
generic type closed with an uncreated type builder. The same goes for GetField and 
GetMethod. 


The solution is unintuitive. TypeBuilder provides three static methods: 


public static ConstructorInfo GetConstructor (Type, ConstructorInfo); 
public static FieldInfo GetField (Type, FieldInfo); 
public static MethodInfo GetMethod (Type, MethodInfo) ; 


Although it doesn’t appear so, these methods exist specifically to obtain members of 
generic types closed with uncreated type builders! The first parameter is the closed 
generic type; the second parameter is the member that you want on the unbound 
generic type. Here's the corrected version of our example: 


MethodBuilder mb = tb.DefineMethod ("Test", MethodAttributes.Public | 
MethodAttributes.Static); 
ILGenerator gen = mb.GetILGenerator(); 


Type variableType = typeof (List<>).MakeGenericType (tb); 


ConstructorInfo unbound = typeof (List<>).GetConstructor (new Type[0]); 
ConstructorInfo ci = TypeBuilder.GetConstructor (variableType, unbound); 


LocalBuilder listVar = gen.DeclareLocal (variableType) ; 
gen.Emit (OpCodes.Newobj, ci); 
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gen.Emit (OpCodes.Stloc, listVar); 
gen.Emit (OpCodes.Ret); 


Circular Dependencies 
Suppose that you want to build two types that reference each other, such as these: 


class A { public B Bee; } 
class B { public A Aye; } 


You can generate this dynamically: 


var publicAtt = FieldAttributes.Public; 


TypeBuilder aBuilder = modBuilder.DefineType ("A"); 
TypeBuilder bBuilder = modBuilder.DefineType ("B"); 


FieldBuilder bee = aBuilder.DefineField ("Bee", bBuilder, publicAtt); 
FieldBuilder aye = bBuilder.DefineField ("Aye", aBuilder, publicAtt); 


Type realA = aBuilder.CreateType(); 
Type realB = bBuilder.CreateType(); 


Notice that we didn’t call CreateType on aBuilder or bBuilder until we populated 
both objects. The principle is this: first hook everything up, and then call Create 
Type on each type builder. 


Interestingly, the realA type is valid but dysfunctional until you call CreateType on 
bBuilder. (If you started using aBuilder prior to this, an exception would be 
thrown when you tried to access field Bee.) 


You might wonder how bBuilder knows to fix up realA after creating realB. The 
answer is that it doesn’t: realA can fix itself the next time it’s used. This is possible 
because after calling CreateType, a TypeBuilder morphs into a proxy for the real 
runtime type. So, realA, with its references to bBuilder, can easily obtain the meta- 
data it needs for the upgrade. 


This system works when the type builder demands simple information of the 
unconstructed type—information that can be predetermined—such as type, mem- 
ber, and object references. In creating realA, the type builder doesn’t need to know, 
for instance, how many bytes realB will eventually occupy in memory. This is just 
as well because realB has not yet been created! But now imagine that realB was a 
struct. The final size of realB is now critical information in creating realA. 


If the relationship is noncyclical; for instance: 


struct A { public B Bee; } 
struct B { } 


you can solve this by first creating struct B and then struct A. But consider this: 


struct A { public B Bee; } 
struct B { public A Aye; } 
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We won't try to emit this because it’s nonsensical to have two structs contain each 
other (C# generates a compile-time error if you try). But the following variation is 
both legal and useful: 


public struct S<T> { ... } // S can be empty and this demo will work. 


class A { S<B> Bee; } 
class B { S<A> Aye; } 


In creating A, a TypeBuilder now needs to know the memory footprint of B, and 
vice versa. To illustrate, let’s assume that struct S is defined statically. Here’s the code 
to emit classes A and B: 


var pub = FieldAttributes.Public; 


TypeBuilder aBuilder = modBuilder.DefineType ("A"); 
TypeBuilder bBuilder = modBuilder.DefineType ("B"); 


aBuilder.DefineField ("Bee", typeof(S<>).MakeGenericType (bBuilder), pub); 
bBuilder.DefineField ("Aye", typeof(S<>).MakeGenericType (aBuilder), pub); 


Type realA = aBuilder.CreateType(); // Error: cannot load type B 
Type realB = bBuilder.CreateType(); 


CreateType now throws a TypeLoadException no matter in which order you go: 


e Call aBuilder.CreateType first and it says “cannot load type B”. 
e Call bBuilder.CreateType first and it says “cannot load type A”! 


To solve this, you must allow the type builder to create realB partway through cre- 
ating realA. You do this by handling the TypeResolve event on the AppDomain class 
just before calling CreateType. So, in our example, we replace the last two lines with 
this: 


TypeBuilder[] uncreatedTypes = { aBuilder, bBuilder }; 


ResolveEventHandler handler = delegate (object 0, ResolveEventArgs args) 


{ 
var type = uncreatedTypes.FirstOrDefault (t => t.FullName == args.Name); 


return type == null ? null : type.CreateType().Assembly; 
}; 


AppDomain.CurrentDomain.TypeResolve += handler; 


Type realA = aBuilder.CreateType(); 
Type realB = bBuilder.CreateType(); 


AppDomain.CurrentDomain.TypeResolve -= handler; 


The TypeResolve event fires during the call to aBuilder.CreateType, at the point 
when it needs you to call CreateType on bBuilder. 





Awkward Emission Targets | 843 


x 
o 
or 
9 
ro¥ 
9 
aa 
9 


pue 
u01291J94 





Handling the TypeResolve event as in this example is also 
necessary when defining a nested type, when the nested and 
parent types refer to each other. 


Parsing IL 


You can obtain information about the content of an existing method by calling 
GetMethodBody on a MethodBase object. This returns a MethodBody object that has 
properties for inspecting a method's local variables, exception handling clauses, 
stack size, as well as the raw IL. Rather like the reverse of Reflection. Emit! 


Inspecting a method’s raw IL can be useful in profiling code. A simple use would be 
to determine which methods in an assembly have changed when an assembly is 
updated. 


To illustrate parsing IL, we'll write an application that disassembles IL in the style of 
ildasm. This could be used as the starting point for a code analysis tool or a higher- 
level language disassembler. 


Remember that in the reflection API, all of C#’s functional 
constructs are either represented by a MethodBase subtype, or 
(in the case of properties, events, and indexers) have Method 
Base objects attached to them. 


Writing a Disassembler 


Here is a sample of the output that our disassembler will produce: 


IL_OOEB: ldfld Disassembler._pos 
IL_OOFO: Uldloc.2 

IL_00F1: add 

IL_O0F2: ldelema System.Byte 

IL_OOF7: ldstr "Hello world" 
IL_OOFC: call System.Byte. ToString 
IL_0101: ldstr on 

IL_0106: call System. String.Concat 


To obtain this output, we must parse the binary tokens that make up the IL. The 
first step is to call the GetILAsByteArray method on MethodBody to obtain the IL as 
a byte array. To make the rest of the job easier, we will write this into a class as 
follows: 


public class Disassembler 


{ 
public static string Disassemble (MethodBase method) 
=> new Disassembler (method) .Dis(); 


StringBuilder _output; // The result to which we'll keep appending 


Module _module; // This will come in handy later 
byte[] _il; // The raw byte code 
int _pos; // The position we're up to in the byte code 
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Disassembler (MethodBase method) 


{ 
_module = method.DeclaringType.Module; 


_il = method.GetMethodBody().GetILAsByteArray(); 
} 


string Dis() 

{ 
_output = new StringBuilder(); 
while (_pos < _il.Length) DisassembleNextInstruction(); 
return _output.ToString(); 


} 
} 


The static Disassemble method will be the only public member of this class. All 
other members will be private to the disassembly process. The Dis method contains 
the main loop where we process each instruction. 


With this skeleton in place, all that remains is to write DisassembleNext 
Instruction. But before doing so, it will help to load all the opcodes into a static 
dictionary so that we can access them by their 8- or 16-bit value. The easiest way to 
accomplish this is to use reflection to retrieve all the static fields whose type is 
OpCode in the OpCodes class: 


static Dictionary<short,OpCode> _opcodes = new Dictionary<short,OpCode>() ; 


static Disassembler() 


{ 


Dictionary<short, OpCode> opcodes = new Dictionary<short, OpCode>(); 
foreach (FieldInfo fi in typeof (OpCodes).GetFields 
(BindingFlags.Public | BindingFlags.Static)) 
if (typeof (OpCode).IsAssignableFrom (fi.FieldType) ) 


ie 
OpCode code = (OpCode) fi.GetValue (null); // Get field's value 


if (code.OpCodeType != OpCodeType.Nternal) 
_opcodes.Add (code.Value, code); 


} 


We've written it in a static constructor so that it executes just once. 


Now we can write DisassembleNextInstruction. Each IL instruction consists of a 
one- or two-byte opcode, followed by an operand of zero, one, two, four, or eight 
bytes. (An exception is inline switch opcodes, which are followed by a variable 
number of operands.) So, we read the opcode, then the operand, and then write out 
the result: 


void DisassembleNextInstruction() 


ic 
int opStart 


Pos; 


OpCode code = ReadOpCode(); 
string operand = ReadOperand (code); 





ParsingIL | 845 


= 
o 
or 
9 
ro¥ 
9 
aa 
9 


pue 
u01291J94 





_output.AppendFormat ("IL_{0:X4}: {1,-12} {2}", 
opStart, code.Name, operand); 
_output.AppendLine(); 
} 


To read an opcode, we advance one byte and see whether we have a valid instruc- 
tion. If not, we advance another byte and look for a two-byte instruction: 


OpCode ReadOpCode() 


{ 
byte byteCode = _il [_pos++]; 
if (_opcodes.ContainsKey (byteCode)) return _opcodes [byteCode]; 


if (_pos == _il.Length) throw new Exception ("Unexpected end of IL"); 
short shortCode = (short) (byteCode * 256 + _il [_post++]); 


if (!_opcodes.ContainsKey (shortCode) ) 
throw new Exception ("Cannot find opcode " + shortCode); 


return _opcodes [shortCode]; 


} 


To read an operand, we first must establish its length. We can do this based on the 
operand type. Because most are four bytes long, we can filter out the exceptions 
fairly easily in a conditional clause. 


The next step is to call FormatOperand, which attempts to format the operand: 


string ReadOperand (OpCode c) 
{ 
int operandLength = 
c.OperandType == OperandType. InlineNone 
203 
.OperandType == OperandType.ShortInlineBrTarget || 


fa) 


c.OperandType == OperandType.ShortInlinel || 
c.OperandType == OperandType.ShortInlineVar 
241.3 
c.OperandType == OperandType. InlineVar 
22% 
c.OperandType == OperandType.Inlinel8 || 


Nn 


.OperandType == OperandType.InlineR 

28: 

.OperandType == OperandType. InlineSwitch 

? 4 * (BitConverter.ToInt32 (_il, _pos) + 1): 
4; // All others are 4 bytes 


fa) 


if (_pos + operandLength > _il.Length) 
throw new Exception ("Unexpected end of IL"); 


string result = FormatOperand (c, operandLength); 

if (result == null) 

{ // Write out operand bytes in hex 
result = ""; 
for (int i = 0; i < operandLength; i++) 

result += _il [_pos + i].ToString ("X2") +" "; 
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} 
_pos += operandLength; 
return result; 


} 


If the result of calling FormatOperand is null, it means the operand needs no spe- 
cial formatting, so we simply write it out in hexadecimal. We could test the disas- 


=z % 
Oy = 
9 5 0 
a9092 
z 9 
9 s 





sembler at this point by writing a FormatOperand method that always returns null. 
Here's what the output would look like: 


IL_O0A8: 1dfld 98 00 00 04 
IL_QOAD: dloc.2 

IL_QOAE: add 

IL_QOAF: 1delema 64 00 00 01 
IL_00B4: ldstr 26 04 00 70 
IL_00B9: call B6 00 00 OA 
IL_QOBE: ldstr 11 01 00 70 
IL_00C3: call 91 00 00 OA 


Although the opcodes are correct, the operands are not much use. Instead of hexa- 
decimal numbers, we want member names and strings. The FormatOperand 
method, when it’s written, will address this—identifying the special cases that bene- 
fit from such formatting. These comprise most four-byte operands and the short 
branch instructions: 


string FormatOperand (OpCode c, int operandLength) 


{ 
if (operandLength == 0) return ""; 
if (operandLength == 4) 
return Get4ByteOperand (c); 
else if (c.OperandType == OperandType.ShortInlineBrTarget) 
return GetShortRelativeTarget() ; 
else if (c.OperandType == OperandType. InlineSwitch) 
return GetSwitchTarget (operandLength) ; 
else 
return null; 
} 


There are three kinds of four-byte operands that we treat specially. The first is refer- 
ences to members or types—with these, we extract the member or type name by 
calling the defining module's ResolveMember method. The second case is strings— 
these are stored in the assembly module’s metadata and can be retrieved by calling 
ResolveString. The final case is branch targets, where the operand refers to a byte 
offset in the IL. We format these by working out the absolute address after the cur- 
rent instruction (+ four bytes): 


string Get4ByteOperand (OpCode c) 


{ 
int intOp = BitConverter.ToInt32 (_il, _pos); 


switch (c.OperandType) 
{ 
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case OperandType. InlineTok: 
case OperandType. InlineMethod: 
case OperandType.InlineField: 
case OperandType. InlineType: 
MemberInfo mi; 
try  { mi = _module.ResolveMember (intOp); } 
catch { return null; } 
if (mi == null) return null; 


if (mi.ReflectedType != null) 


return mi.ReflectedType.FullName + "." + mi.Name; 
else if (mi is Type) 
return ((Type)mi).FullName; 
else 
return mi.Name; 
case OperandType. InlineString: 
string s = _module.ResolveString (intOp); 
if (s != null) s="""+ 54 "'"5 
return s; 


case OperandType. InlineBrTarget: 
return "IL_" + (_pos + intOp + 4).ToString ("X4"); 


default: 
return null; 


The point where we call ResolveMember is a good window for 
a code analysis tool that reports on method dependencies. 


For any other four-byte opcode, we return null (this will cause ReadOperand to for- 
mat the operand as hex digits). 


The final kinds of operand that need special attention are short branch targets and 
inline switches. A short branch target describes the destination offset as a single 
signed byte, as at the end of the current instruction (ie., + one byte). A switch target 
is followed by a variable number of four-byte branch destinations: 


string GetShortRelativeTarget() 

{ 
int absoluteTarget = _pos + (sbyte) _il [_pos] + 1; 
return "IL_" + absoluteTarget.ToString ("X4"); 

} 


string GetSwitchTarget (int operandLength) 
{ 
int targetCount = BitConverter.ToInt32 (_il, _pos); 
string [] targets = new string [targetCount]; 
for (int i = 0; i < targetCount; i++) 
{ 
int ilTarget = BitConverter.ToInt32 (_il, _pos + (i + 1) * 4); 
targets [i] = "IL_" + (_pos + ilTarget + operandLength).ToString ("X4"); 
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} 


return "(" + string.Join (", ", targets) + ")"; 
i 
This completes the disassembler. We can test it by disassembling one of its own 


methods: 


z 3 
Oy = 
95 0 
299 
z (9 
9 s 





MethodInfo mi = typeof (Disassembler).GetMethod ( 
"ReadOperand", BindingFlags.Instance | BindingFlags.NonPublic) ; 


Console.WriteLine (Disassembler.Disassemble (mi)); 
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Dynamic Programming 








Chapter 4 explained how dynamic binding works in the C# language. In this chap- 
ter, we look briefly at the Dynamic Language Runtime and then explore the follow- 
ing dynamic programming patterns: 


¢ Numeric type unification 
e Dynamic member overload resolution 
¢ Custom binding (implementing dynamic objects) 


e Dynamic language interoperability 


In Chapter 25, we describe how dynamic can improve COM 
interoperability. 


The types in this chapter reside in the System. Dynamic namespace, except for Call 
Site<>, which resides in System.Runtime.CompilerServices. 


The Dynamic Language Runtime 


C# relies on the Dynamic Language Runtime (DLR) to perform dynamic binding. 


Contrary to its name, the DLR is not a dynamic version of the CLR. Rather, it’s a 
library that sits atop the CLR—just like any other library such as System.Xml.dll. Its 
primary role is to provide runtime services to unify dynamic programming—in 
both statically and dynamically typed languages. Hence, languages such as C#, Vis- 
ual Basic, IronPython, and IronRuby all use the same protocol for calling functions 
dynamically. This allows them to share libraries and call code written in other 
languages. 
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What Are Call Sites? 


When the compiler encounters a dynamic expression, it has no idea who will evalu- 
ate that expression at runtime. For instance, consider the following method: 


public dynamic Foo (dynamic x, dynamic y) 
{ 


return x / y; // Dynamic expression 


} 


The x and y variables could be any CLR object, a COM object, or even an object 
hosted in a dynamic language. The compiler cannot, therefore, take its usual static 
approach of emitting a call to a known method of a known type. Instead, the com- 
piler emits code that eventually results in an expression tree that describes the oper- 
ation, managed by a call site that the DLR will bind at runtime. The call site 
essentially acts as an intermediary between caller and callee. 


A call site is represented by the CallSite<> class in System.Core.dll. We can see this 
by disassembling the preceding method—the result is something like this: 


static CallSite<Func<CallSite,object,object,object>> divideSite; 


[return: Dynamic] 
public object Foo ([Dynamic] object x, [Dynamic] object y) 


if (divideSite == null) 
divideSite = 
CallSite<Func<CallSite,object,object,object>>.Create ( 
Microsoft.CSharp.RuntimeBinder.Binder.BinaryOperation ( 
CSharpBinderFlags.None, 
ExpressionType.Divide, 
/* Remaining arguments omitted for brevity */ )); 


return divideSite.Target (divideSite, x, y); 


} 


As you can see, the call site is cached in a static field to avoid the cost of re-creating 
it on each call. The DLR further caches the result of the binding phase and the 
actual method targets. (There can be multiple targets depending on the types of x 
and y.) 


The actual dynamic call then happens by calling the site’s Target (a delegate), pass- 
ing in the x and y operands. 


Notice that the Binder class is specific to C#. Every language with support for 
dynamic binding provides a language-specific binder to help the DLR interpret 
expressions in a manner specific to that language, so as not to surprise the 
programmer. For instance, if we called Foo with integer values of 5 and 2, the C# 
binder would ensure that we got back 2. In contrast, a VB.NET binder would give us 
2.5. 














852 | Chapter 20: Dynamic Programming 


The DLR also makes it relatively easy to write new dynamic languages in .NET. 
Instead of having to emit IL, dynamic language authors work at the level of expres- 
sion trees (the same expression trees in System.Linq.Expressions that we talked 
about in Chapter 8). 


The DLR further ensures that all consumers get the benefit of call-site caching, an 
optimization whereby the DLR avoids unnecessarily repeating the potentially 
expensive member resolution decisions made during dynamic binding. 


Numeric Type Unification 


Chapter 4 explained how dynamic lets us write a single method that works across all 
numeric types: 


static dynamic Mean (dynamic x, dynamic y) => (x + y) / 23 


static void Main() 


{ 
int x = 3, y =5; 
Console.WriteLine (Mean (x, y)); 


} 


It's a humorous reflection on C# that the keywords static and 
dynamic can appear adjacently! The same applies to the key- 
words internal and extern. 


However, this (unnecessarily) sacrifices static type safety. The following compiles 
without error but then fails at runtime: 


string s = Mean (3, 5); // Runtime error! 


We can fix this by introducing a generic type parameter and then casting to dynamic 
within the calculation itself: 


static T Mean<T> (T x, T y) 


{ 
dynamic result = ((dynamic) x + y) / 2; 
return (T) result; 


} 


Notice that we explicitly cast the result back to T. If we omitted this cast, wed be 
relying on an implicit cast, which might at first appear to work correctly. The 
implicit cast would fail at runtime, though, upon calling the method with an 8- or 
16-bit integral type. To understand why, consider what happens with ordinary static 
typing when you sum two 8-bit numbers together: 


byte b = 3; 

Console.WriteLine ((b + b).GetType().Name); // Int32 
We get an Int32—because the compiler “promotes” 8- or 16-bit numbers to Int32 
prior to performing arithmetic operations. For consistency, the C# binder instructs 
the DLR to do exactly the same thing, and we end up with an Int32 that requires an 
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explicit cast to the smaller numeric type. Of course, this could create the possibility 
of overflow if we were, say, summing rather than averaging the values. 


Dynamic binding incurs a small performance hit—even with call-site caching. You 
can mitigate this by adding statically typed overloads that cover just the most com- 
monly used types. For example, if subsequent performance profiling showed that 
calling Mean with doubles was a bottleneck, you could add the following overload: 


static double Mean (double x, double y) => (x + y) / 23 


The compiler will favor that overload when Mean is called with arguments that are 
known at compile time to be of type double. 


Dynamic Member Overload Resolution 


Calling a statically known method with dynamically typed arguments defers mem- 
ber overload resolution from compile time to runtime. This is useful in simplifying 
certain programming tasks—such as simplifying the Visitor design pattern. It’s also 
useful in working around limitations imposed by C#’s static typing. 


Simplifying the Visitor Pattern 


In essence, the Visitor pattern allows you to “add” a method to a class hierarchy 
without altering existing classes. Although useful, this pattern in its static incarna- 
tion is subtle and unintuitive compared to most other design patterns. It also 
requires that visited classes be made “Visitor-friendly” by exposing an Accept 
method, which can be impossible if the classes are not under your control. 


With dynamic binding, you can achieve the same goal more easily—and without 
needing to modify existing classes. To illustrate, consider the following class 
hierarchy: 


class Person 


{ 
public string FirstName { get; set; } 


public string LastName { get; set; } 


// The Friends collection may contain Customers & Employees: 
public readonly IList<Person> Friends = new Collection<Person> (); 


} 


class Customer : Person { public decimal CreditLimit { get; set; } } 
class Employee : Person { public decimal Salary { get; set; } } 


Suppose that we want to write a method that programmatically exports a Person's 
details to an XML XElement. The most obvious solution is to write a virtual method 
called ToXElement() in the Person class that returns an XElement populated with a 
Person’s properties. We would then override it in Customer and Employee classes 
such that the XElement was also populated with CreditLimit and Salary. This pat- 
tern can be problematic, however, for two reasons: 
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e You might not own the Person, Customer, and Employee classes, making it 
impossible to add methods to them. (And extension methods wouldn't give 
polymorphic behavior.) 


¢ The Person, Customer, and Employee classes might already be quite big. A fre- 
quent antipattern is the “God Object,’ in which a class such as Person attracts 
so much functionality that it becomes a nightmare to maintain. A good anti- 
dote is to avoid adding functions to Person that don’t need to access Person's 
private state. A ToXELement method might be an excellent candidate. 


With dynamic member overload resolution, we can write the ToXElement function- 
ality in a separate class, without resorting to ugly switches based on type: 


class ToXElementPersonVisitor 
public XElement DynamicVisit (Person p) => Visit ((dynamic)p); 


XElement Visit (Person p) 
{ 
return new XElement ("Person", 
new XAttribute ("Type", p.GetType().Name), 
new XElement ("FirstName", p.FirstName), 
new XElement ("LastName", p.LastName), 
p.Friends.Select (f => DynamicVisit (f)) 
); 
} 


XElement Visit (Customer c) // Specialized logic for customers 


{ 
XElement xe = Visit ((Person)c); // Call "base" method 
xe.Add (new XElement ("CreditLimit", c.CreditLimit)); 
return xe; 


} 


XElement Visit (Employee e) // Specialized logic for employees 
{ 
XElement xe = Visit ((Person)e); // Call "base" method 
xe.Add (new XElement ("Salary", e.Salary)); 
return xe; 
} 
} 


The DynamicVisit method performs a dynamic dispatch—calling the most specific 
version of Visit as determined at runtime. Notice the line in boldface, in which we 


call DynamicVisit on each person in the Friends collection. This ensures that if a 
friend is a Customer or Employee, the correct overload is called. 


We can demonstrate this class as follows: 


var cust = new Customer 


{ 
FirstName = "Joe", LastName = "Bloggs", CreditLimit = 123 


3; 
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cust.Friends.Add ( 
new Employee { FirstName = "Sue", LastName = "Brown", Salary = 50000 } 


); 
Console.WriteLine (new ToXElementPersonVisitor().DynamicVisit (cust)); 
Here's the result: 


<Person Type="Customer"> 
<FirstName>Joe</FirstName> 
<LastName>Bloggs</LastName> 
<Person Type="Employee"> 
<FirstName>Sue</FirstName> 
<LastName>Brown</LastName> 
<Salary>50000</Salary> 
</Person> 
<CreditLimit>123</CreditLimit> 
</Person> 


Variations 


If you plan more than one visitor class, a useful variation is to define an abstract 
base class for visitors: 


abstract class PersonVisitor<T> 


{ 
public T DynamicVisit (Person p) { return Visit ((dynamic)p); } 


protected abstract T Visit (Person p); 
protected virtual T Visit (Customer c) { return Visit ((Person) c); } 
protected virtual T Visit (Employee e) { return Visit ((Person) e); } 


} 


Subclasses then don’t need to define their own DynamicVisit method: all they do is 
override the versions of Visit whose behavior they want to specialize. This also has 
the advantages of centralizing the methods that encompass the Person hierarchy 
and allowing implementers to call base methods more naturally: 


class ToXElementPersonVisitor : PersonVisitor<XElement> 
{ 
protected override XElement Visit (Person p) 
{ 
return new XElement ("Person", 
new XAttribute ("Type", p.GetType().Name) , 
new XElement ("FirstName", p.FirstName), 
new XElement ("LastName", p.LastName), 
p.Friends.Select (f => DynamicVisit (f)) 
); 
} 


protected override XElement Visit (Customer c) 

{ 
XElement xe = base.Visit (c); 
xe.Add (new XElement ("CreditLimit", c.CreditLimit)); 
return xe; 
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} 


protected override XElement Visit (Employee e) 
{ 
XElement xe = base.Visit (e); 
xe.Add (new XElement ("Salary", e.Salary)); 
return xe; 
} 
} 


You then can even subclass ToXELementPersonVisitor itself. 
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Multiple Dispatch 


C# and the CLR have always supported a limited form of dynamism in the form of 
virtual method calls. This differs from C#’s dynamic binding in that for virtual 
method calls, the compiler must commit to a particular virtual member at compile 
time—based on the name and signature of a member you called. This means that: 


« The calling expression must be fully understood by the compiler (e.g., it must 
decide at compile time whether a target member is a field or property). 


¢ Overload resolution must be completed entirely by the compiler, based on the 
compile-time argument types. 


A consequence of that last point is that the ability to perform virtual method calls is 
known as single dispatch. To see why, consider the following method call (in which 
Walk is a virtual method): 


animal.Walk (owner); 


The runtime decision of whether to invoke a dog’s Walk method or a cat’s Walk 
method depends only on the type of the receiver, animal (hence, single). If many 
overloads of Walk accept different kinds of owner, an overload will be selected at 
compile time without regard to the actual runtime type of the owner object. In other 
words, only the runtime type of the receiver can vary which method gets called. 


In contrast, a dynamic call defers overload resolution until runtime: 


animal.Walk ((dynamic) owner); 


The final choice of which Walk method to call now depends on the types of both 
animal and owner—this is called multiple dispatch because the runtime types of argu- 
ments, in addition to the receiver type, contribute to the determination of which 
Walk method to call. 











Anonymously Calling Members of a Generic Type 


The strictness of C#’s static typing is a double-edged sword. On the one hand, it 
enforces a degree of correctness at compile time. On the other hand, it occasionally 
makes certain kinds of code difficult or impossible to express, at which point you 
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must resort to reflection. In these situations, dynamic binding is a cleaner and faster 
alternative to reflection. 


An example is when you need to work with an object of type G<T> where T is 
unknown. We can illustrate this by defining the following class: 


public class Foo<T> { public T Value; } 
Suppose that we then write a method as follows: 


static void Write (object obj) 


if (obj is Foo<>) // Illegal 
Console.WriteLine ((Foo<>) obj).Value); // Illegal 
} 


This method won't compile: you can’t invoke members of unbound generic types. 


Dynamic binding offers two means by which we can work around this. The first is 
to access the Value member dynamically as follows: 


static void Write (dynamic obj) 


{ 
try { Console.WriteLine (obj.Value); } 
catch (Microsoft.CSharp.RuntimeBinder .RuntimeBinderException) {...} 


} 


This has the (potential) advantage of working with any object that defines a Value 
field or property. However, there are a couple of problems. First, catching an excep- 
tion in this manner is somewhat messy and inefficient (and there’s no way to ask the 
DLR in advance, “Will this operation succeed?”). Second, this approach wouldn't 
work if Foo were an interface (say, IFoo<T>) and either of the following conditions 
were true: 


e Value was implemented explicitly. 


e The type that implemented IFoo<T> was inaccessible (more on this soon). 


A better solution is to write an overloaded helper method called GetFooValue and 
to call it using dynamic member overload resolution: 


static void Write (dynamic obj) 


{ 
object result = GetFooValue (obj); 
if (result != null) Console.WriteLine (result); 


} 


static T GetFooValue<T> (Foo<T> foo) => foo.Value; 
static object GetFooValue (object foo) => null; 


Notice that we overloaded GetFooValue to accept an object parameter, which acts 
as a fallback for any type. At runtime, the C# dynamic binder will pick the best over- 
load when calling GetFooValue with a dynamic argument. If the object in question 
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is not based on Foo<T>, it will choose the object-parameter overload instead of 
throwing an exception. 


An alternative is to write just the first GetFooValue overload 
and then catch the RuntimeBinderException. The advantage 
is that it distinguishes the case of foo.Value being null. The 
disadvantage is that it incurs the performance overhead of 
throwing and catching an exception. 


In Chapter 19, we solved the same problem with an interface using reflection—with 
a lot more effort (see “Anonymously Calling Members of a Generic Interface” on 
page 815). The example we used was to design a more powerful version of 
ToString() that could understand objects such as IEnumerable and IGroupingg, >. 
Here's the same example solved more elegantly using dynamic binding: 
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static string GetGroupKey<TKey,TElement> (IGrouping<TKey,TELement> group) 
=> "Group with key=" + group.Key + 


mony 
static string GetGroupKey (object source) => null; 


public static string ToStringEx (object value) 
{ 


if (value == null) return "<null>"; 
if (value is string) return (string) value; 
if (value.GetType().IsPrimitive) return value. ToString(); 


StringBuilder sb = new StringBuilder(); 


string groupKey = GetGroupKey ((dynamic)value) ; // Dynamic dispatch 
if (groupKey != null) sb.Append (groupKey); 


if (value is IEnumerable) 
foreach (object element in ((IEnumerable)value) ) 
sb.Append (ToStringEx (element) + " "); 


if (sb.Length == 0) sb.Append (value.ToString()); 


return "\r\n" + sb.ToString(); 
} 


Here it is in action: 


Console.WriteLine (ToStringEx ("xyyzzz".GroupBy (c => c) )); 


Group with key=x: x 
Group with key=y: y y 
Group with key=z: zzz 


Notice that we used dynamic member overload resolution to solve this problem. If 
we instead did this: 


dynamic d = value; 
try { groupKey = d.Value); } 
catch (Microsoft.CSharp.RuntimeBinder.RuntimeBinderException) {...} 
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it would fail, because LINQ’s GroupBy operator returns a type implementing 
IGrouping<,>, which itself is internal and therefore inaccessible: 


internal class Grouping : IGrouping<TKey,TElement>, ... 


{ 
public TKey Key; 


A 


Even though the Key property is declared public, its containing class caps it at 
internal, making it accessible only via the IGrouping<,> interface. And as is 
explained in Chapter 4, there’s no way to instruct the DLR to bind to that interface 
when invoking the Value member dynamically. 


Implementing Dynamic Objects 


An object can provide its binding semantics by implementing IDynamicMetaObject 
Provider—or more easily by subclassing DynamicObject, which provides a default 
implementation of this interface. This is demonstrated briefly in Chapter 4 via the 
following example: 


static void Main() 


{ 
dynamic d = new Duck(); 
d.Quack(); // Quack method was called 
d.Waddle(); // Waddle method was called 
} 


public class Duck : DynamicObject 
{ 


public override bool TryInvokeMember ( 
InvokeMemberBinder binder, object[] args, out object result) 


{ 


Console.WriteLine (binder.Name + " method was called"); 
result = null; 
return true; 


} 
} 


DynamicObject 


In the preceding example, we overrode TryInvokeMember, which allows the con- 
sumer to invoke a method on the dynamic object—such as a Quack or Waddle. 
DynamicObject exposes other virtual methods that enable consumers to use other 
programming constructs as well. The following correspond to constructs that have 
representations in C#: 
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Method Programming construct 


TryInvokeMember Method 
TryGetMember, TrySetMember Property or field 
TryGetIndex, TrySetIndex Indexer 





a") 
TryUnaryOperation Unary operator such as ! g 
TryBinaryOperation Binary operator such as == 5 
TryConvert Conversion (cast) to another type 2 
TryInvoke Invocation on the object itself—e.g., d("foo" ) bs 





These methods should return true if successful. If they return false, the DLR will 
fall back to the language binder, looking for a matching member on the Dynamic 
Object (subclass) itself. If this fails, a RuntimeBinderException is thrown. 


We can illustrate TryGetMember and TrySetMember with a class that lets us dynami- 
cally access an attribute in an XELement (System. Xml. Linq): 


static class XExtensions 
{ 
public static dynamic DynamicAttributes (this XElement e) 
=> new XWrapper (e); 


class XWrapper : DynamicObject 
{ 
XElement _element; 
public XWrapper (XElement e) { _element = e; } 


public override bool TryGetMember (GetMemberBinder binder, 
out object result) 
{ 
result = _element.Attribute (binder .Name) .Value; 
return true; 


} 


public override bool TrySetMember (SetMemberBinder binder, 
object value) 

af 
_element.SetAttributeValue (binder.Name, value); 
return true; 

} 

} 
} 


Here’s how to use it: 


XElement x = XElement.Parse (@"<Label Text=""Hello"" Id=""5""/>"); 
dynamic da = x.DynamicAttributes(); 

Console.WriteLine (da.Id); // 5 

da.Text = "Foo"; 

Console.WriteLine (x.ToString()); // <Label Text="Foo" Id="5" /> 
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The following does a similar thing for System.Data. IDataRecord, making it easier 
to use data readers: 


public class DynamicReader : DynamicObject 
{ 
readonly IDataRecord _dataRecord; 
public DynamicReader (IDataRecord dr) { _dataRecord = dr; } 


public override bool TryGetMember (GetMemberBinder binder, 
out object result) 
{ 
result = _dataRecord [binder .Name]; 
return true; 
} 
} 


using (IDataReader reader = someDbCommand.ExecuteReader()) 


{ 


dynamic dr = new DynamicReader (reader); 
while (reader .Read()) 


{ 
int id = dr.ID; 
string firstName = dr.FirstName; 
DateTime dob = dr.DateOfBirth; 


a 
} 


The following demonstrates TryBinaryOperation and TryInvoke: 


static void Main() 


{ 
dynamic d = new Duck(); 
Console.WriteLine (d + d); // foo 
Console.WriteLine (d (78, 'x')); // 123 
} 


public class Duck : DynamicObject 
{ 


public override bool TryBinaryOperation (BinaryOperationBinder binder, 
object arg, out object result) 


{ 
Console.WriteLine (binder.Operation); // Add 
result = "foo"; 
return true; 

} 


public override bool TryInvoke (InvokeBinder binder, 
object[] args, out object result) 

{ 

Console.WriteLine (args[0]); // 78 

result = 123; 

return true; 
} 

} 
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DynamicObject also exposes some virtual methods for the benefit of dynamic lan- 
guages. In particular, overriding GetDynamicMemberNames allows you to return a list 
of all member names that your dynamic object provides. 


Another reason to implement GetDynamicMemberNanes is that 
Visual Studio’s debugger makes use of this method to display a 
view of a dynamic object. 


ExpandoObject 


Another simple application of DynamicObject would be to write a dynamic class 
that stored and retrieved objects in a dictionary, keyed by string. However, this 
functionality is already provided via the ExpandoObject class: 


s1weukq 
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dynamic x = new ExpandoObject(); 

x.FavoriteColor = ConsoleColor.Green; 
x.FavoriteNumber = 7; 

Console.WriteLine (x.FavoriteColor); // Green 
Console.WriteLine (x.FavoriteNumber); // 7 


ExpandoObject implements IDictionary<string,object>—so we can continue 
our example and do this: 


var dict = (IDictionary<string,object>) x; 


Console.WriteLine (dict ["FavoriteColor"]); // Green 
Console.WriteLine (dict ["FavoriteNumber"]); // 7 
Console.WriteLine (dict.Count); // 2 


Interoperating with Dynamic Languages 


Although C# supports dynamic binding via the dynamic keyword, it doesn’t go as 
far as allowing you to execute an expression described in a string at runtime: 


string expr = "2 * 3"; 

// We can't "execute" expr 
This is because the code to translate a string into an expression tree requires a lexi- 
cal and semantic parser. These features are built into the C# compiler and are not 
available as a runtime service. At runtime, C# merely provides a binder, which 
instructs the DLR how to interpret an already built expression tree. 


True dynamic languages such as IronPython and IronRuby do allow you to execute 
an arbitrary string, and this is useful in tasks such as scripting, dynamic configura- 
tion, and implementing dynamic rules engines. So, although you can write most of 
your application in C#, it can be useful to call out to a dynamic language for such 
tasks. In addition, you might want to use an API that is written in a dynamic lan- 
guage where no equivalent functionality is available in a .NET library. 
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The Roslyn scripting NuGet package Microsoft.CodeAnaly- 
sis. CSharp.Scripting provides an API that lets you execute a C# 
string, although it does so by first compiling your code into a 
program. The compilation overhead makes it slower than 
Python interop, unless you intend to execute the same expres- 
sion repeatedly. 


In the following example, we use IronPython to evaluate an expression created at 
runtime from within C#. You could use this script to write a calculator: 


using System; 

using IronPython.Hosting; 

using Microsoft.Scripting; 

using Microsoft.Scripting.Hosting; 


class Calculator 


{ 
static void Main() 
{ 
int result = (int) Calculate ("2 * 3"); 
Console.WriteLine (result); // 6 
} 


static object Calculate (string expression) 


{ 


ScriptEngine engine = Python.CreateEngine(); 
return engine.Execute (expression); 


} 
} 


To run this code, add the NuGet packages DynamicLangua- 
geRuntime (not to be confused with the System.Dynamic.Run- 
time package) and IronPython to your application. 


Because we're passing a string into Python, the expression will be evaluated accord- 
ing to Python's rules and not C#’s. It also means that we can use Python's language 
features, such as lists: 


var list = (IEnumerable) Calculate ("[1, 2, 3] + [4, 5]"); 
foreach (int n in list) Console.Write (n); // 12345 


Passing State Between C# and a Script 


To pass variables from C# to Python, a few more steps are required. The following 
example illustrates those steps and could be the basis of a rules engine: 


// The following string could come from a file or database: 
string auditRule = "taxPaidLastYear / taxPaidThisYear > 2"; 


ScriptEngine engine = Python.CreateEngine (); 
ScriptScope scope = engine.CreateScope (); 


scope.SetVariable ("taxPaidLastYear", 20000m); 
scope.SetVariable ("taxPaidThisYear", 8000m); 
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ScriptSource source = engine.CreateScriptSourceFromString ( 
auditRule, SourceCodeKind. Expression); 


bool auditRequired = (bool) source.Execute (scope); 
Console.WriteLine (auditRequired) ; // True 


You can also get variables back by calling GetVariable: 


string code = "result = input * 3"; 


ScriptEngine engine = Python.CreateEngine(); 
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ScriptScope scope = engine.CreateScope(); 
scope.SetVariable ("input", 2); 


ScriptSource source = engine.CreateScriptSourceFromString (code, 
SourceCodeKind.SingleStatement) ; 

source.Execute (scope); 

Console.WriteLine (scope.GetVariable ("result")); // 6 


Notice that we specified SourceCodeKind.SingleStatement in the second example 
(rather than Expression) to inform the engine that we want to execute a statement. 


Types are automatically marshaled between the .NET and Python worlds. You can 
even access members of .NET objects from the scripting side: 


string code = @"sb.Append (""World"")"; 
ScriptEngine engine = Python.CreateEngine (); 


ScriptScope scope = engine.CreateScope (); 
var sb = new StringBuilder ("Hello"); 
scope.SetVariable ("sb", sb); 


ScriptSource source = engine.CreateScriptSourceFromString ( 
code, SourceCodeKind.SingleStatement) ; 

source.Execute (scope); 

Console.WriteLine (sb.ToString()); // HelloWorld 
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21 


Cryptography 








In this chapter, we discuss the major cryptography APIs in .NET Core: 


e Windows Data Protection (DPAPI) 
e Hashing 
e Symmetric encryption 


¢ Public key encryption and signing 


The types covered in this chapter are defined in the following namespaces: 


System. Security; 
System. Security.Cryptography; 


Overview 


Table 21-1 summarizes the cryptography options in .NET. In the remaining sec- 
tions, we explore each of these. 


Table 21-1. Encryption and hashing options in .NET 


Option Keys to Speed Strength Ne 
manage 
File.Encrypt 0 Fast Depends on Protects files transparently with filesystem 
user's support. A key is derived implicitly from the 
password logged-in user's credentials. Windows only. 
Windows Data 0 Fast Dependson — Encrypts and decrypts byte arrays using an 
Protection user's implicitly derived key. 
password 
Hashing 0 Fast High One-way (irreversible) transformation. Used for 


storing passwords, comparing files, and checking 
for data corruption. 
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Option Keys to Speed Strength Notes 


manage 
Symmetric 1 Fast — High For general-purpose encryption/decryption. The 
Encryption same key encrypts and decrypts. Can be used to 
secure messages in transit. 
Public Key Encryption 2 Slow High Encryption and decryption use different keys. 


Used for exchanging a symmetric key in message 
transmission and for digitally signing files. 





.NET Core also provides more specialized support for creating and validating XML- 
based signatures in System.Security.Cryptography.Xml and types for working 
with digital certificates in System. Security.Cryptography.X509Certificates. 


Windows Data Protection 


In the section “File and Directory Operations” on page 665 in Chapter 15, we 
described how you could use File.Encrypt to request that the operating system 
transparently encrypt a file: 


File.WriteALlText ("myfile.txt", ""); 
File.Encrypt ("myfile.txt"); 
File.AppendAllText ("myfile.txt", "sensitive data"); 


The encryption in this case uses a key derived from the logged-in user’s password. 
You can use this same implicitly derived key to encrypt a byte array with the Win- 
dows Data Protection API (DPAPI). The DPAPI is exposed through the Protected 
Data class—a simple type with two static methods: 


public static byte[] Protect 
(byte[] userData, byte[] optionalEntropy, DataProtectionScope scope); 


public static byte[] Unprotect 
(byte[] encryptedData, byte[] optionalEntropy, DataProtectionScope scope); 


Windows Data Protection is available on Windows only, and 
throws a PlatformNotSupportedException on other operat- 
ing systems. 


Whatever you include in optionalEntropy is added to the key, thereby increasing 
its security. The DataProtectionScope enum argument allows two options: Current 
User or LocalMachine. With CurrentUser, a key is derived from the logged-in user’s 
credentials; with LocalMachine, a machine-wide key is used, common to all users. 
This means that with the CurrentUser scope, data encrypted by one user cannot be 
decrypted by another. A LocalMachine key provides less protection, but works 
under a Windows Service, or a program needing to operate under a variety of 
accounts. 
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Here’s a simple encryption and decryption demonstration: 


byte[] original = {1, 2, 3, 4, 5}; 
DataProtectionScope scope = DataProtectionScope.CurrentUser; 


byte[] encrypted = ProtectedData.Protect (original, null, scope); 
byte[] decrypted = ProtectedData.Unprotect (encrypted, null, scope); 
// decrypted is now {1, 2, 3, 4, 5} 


Windows Data Protection provides moderate security against an attacker with full 
access to the computer, depending on the strength of the user’s password. With 
LocalMachine scope, it’s effective only against those with restricted physical and 
electronic access. 


Hashing 


A hashing algorithm distills a potentially large number of bytes into a small fixed- 
length hashcode. Hashing algorithms are designed such that a single-bit change any- 
where in the source data results in a significantly different hashcode. This makes it 
suitable for comparing files or detecting accidental (or malicious) corruption to a 
file or data stream. 


Hashing also acts as one-way encryption, because it’s difficult-to-impossible to con- 
vert a hashcode back into the original data. This makes it ideal for storing pass- 
words in a database, because should your database become compromised, you don't 
want the attacker to gain access to plain-text passwords. To authenticate, you simply 
hash what the user types in and compare it to the hash that’s stored in the database. 


To hash, you call ComputeHash on one of the HashAlgorithm subclasses such as SHA1 
or SHA256: 


byte[] hash; 
using (Stream fs = File.OpenRead ("checkme.doc")) 
hash = SHA1.Create().ComputeHash (fs); // SHA1 hash is 20 bytes long 


The ComputeHash method also accepts a byte array, which is convenient for hashing 
passwords (we describe a more secure technique in “Hashing Passwords” on page 
870): 


byte[] data = System.Text.Encoding.UTF8.GetBytes ("stRhong%pword") ; 
byte[] hash = SHA256.Create().ComputeHash (data); 


The GetBytes method on an Encoding object converts a string 
to a byte array; the GetString method converts it back. An 
Encoding object cannot, however, convert an encrypted or 
hashed byte array to a string, because scrambled data usually 
violates text encoding rules. Instead, use Convert.To 
Base64String and Convert.FromBase64String: these convert 
between any byte array and a legal (and XML- or JSON- 
friendly) string. 
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Hash Algorithms in .NET Core 


SHA1 and SHA256 are two of the HashAlgorithm subtypes provided by .NET Core. 
Here are all the major algorithms, in ascending order of security (and hash length, 
in bytes): 


MD5(16) > SHA1(20) > SHA256(32) > SHA384(48) > SHAS12(64) 


MD5 and SHA1 are currently the fastest algorithms, although the other algorithms are 
not more than (roughly) two times slower in their current implementations. To give 
a ballpark figure, you can expect a performance of more than 100 MB per second 
with any of these algorithms on today’s typical desktop or server. The longer hashes 
decrease the possibility of collision (two distinct files yielding the same hash). 


Use at least SHA256 when hashing passwords or other security- 
sensitive data. MD5 and SHA1 are considered insecure for this 
purpose, and are suitable to protect only against accidental 
corruption, not deliberate tampering. 


Hashing Passwords 


The longer SHA algorithms are suitable as a basis for password hashing, if you 
enforce a strong password policy to mitigate a dictionary attack—a strategy whereby 
an attacker builds a password lookup table by hashing every word in a dictionary. 


A standard technique, when hashing passwords, is to incorporate “salt”—a long ser- 
ies of bytes that you initially obtain via a random number generator and then com- 
bine with each password before hashing. This frustrates hackers in two ways: 


¢ They must also know the salt bytes. 


e They cannot use rainbow tables (publicly available precomputed databases of 
passwords and their hashcodes), although a dictionary attack might still be 
possible with sufficient computing power 


You can further strengthen security by “stretching” your password hashes—repeat- 
edly rehashing to obtain more computationally intensive byte sequences. If you 
rehash 100 times, a dictionary attack that might otherwise take one month would 
take eight years. The KeyDerivation, Rfc2898DeriveBytes, and PasswordDerive 
Bytes classes perform exactly this kind of stretching while also allowing for conve- 
nient salting. Of these, KeyDerivation.Pbkdf2 offers the best hashing: 


byte[] encrypted = KeyDerivation.Pbkdf2 ( 
password: "stRhong%pword", 
salt: Encoding.UTF8.GetBytes ("j78Y#p)/saREN!y3@"), 
prf: KeyDerivattonPrf.HMACSHA512, 
iterationCount: 100, 
numBytesRequested: 64); 
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KeyDerivation.Pbkdf2 requires the NuGet package Micro 
soft .AspNetCore.Cryptography.KeyDerivation. Even 
though it’s in the ASPNET Core namespace, any .NET Core 
application can use it. 


Symmetric Encryption 


Symmetric encryption uses the same key for encryption as for decryption. The 
Framework provides four symmetric algorithms, of which Rijndael (pronounced 
“Rhine Dahl” or “Rain Doll”) is the premium; the other algorithms are intended 
mainly for compatibility with older applications. Rijndael is both fast and secure 
and has two implementations: 


e The Rijndael class, which has been available since Framework 1.0 


e The Aes class, which was introduced in Framework 3.5 


The two are almost identical, except that Aes does not let you weaken the cipher by 
changing the block size. Aes is recommended by the CLR’s security team. 


Rijndael and Aes allow symmetric keys of length 16, 24, or 32 bytes: all are cur- 
rently considered secure. Here's how to encrypt a series of bytes as they’re written to 
a file, using a 16-byte key: 


byte[] key = {145,12,32,245,98,132,98,214,6,77,131,44,221,3,9,50}; 
byte[] iv = {15,122,132,5,93,198,44,31,9,39,241,49,250,188,80,7}; 


byte[] data = { 1, 2, 3, 4, 5}; // This ts what we're encrypting. 


using (SymmetricAlgorithm algorithm = Aes.Create()) 

using (ICryptoTransform encryptor = algorithm.CreateEncryptor (key, iv)) 

using (Stream f = File.Create ("encrypted.bin")) 

using (Stream c = new CryptoStream (f, encryptor, CryptoStreamMode.Write) ) 
c.Write (data, 0, data.Length); 


The following code decrypts the file: 


byte[] key = {145,12,32,245,98,132,98,214,6,77,131,44,221,3,9,50}; 
byte[] iv = {15,122,132,5,93,198,44,31,9,39,241,49,250,188,80,7}; 


byte[] decrypted = new byte[5]; 


using (SymmetricAlgorithm algorithm = Aes.Create()) 
using (ICryptoTransform decryptor = algorithm.CreateDecryptor (key, iv)) 
using (Stream f = File.OpenRead ("encrypted.bin")) 
using (Stream c = new CryptoStream (f, decryptor, CryptoStreamMode.Read) ) 
for (int b; (b = c.ReadByte()) > -1;) 
Console.Write (b+ " "); //12345 


In this example, we made up a key of 16 randomly chosen bytes. If the wrong key 
was used in decryption, CryptoStream would throw a CryptographicException. 
Catching this exception is the only way to test whether a key is correct. 








SymmetricEncryption | 871 


fe) 
= 
< 
mol 
er 
fo) 
Ce} 
= 
9 
mo) 
= 
< 


As well as a key, we made up an IV, or Initialization Vector. This 16-byte sequence 
forms part of the cipher—much like the key—but is not considered secret. If you're 
transmitting an encrypted message, you would send the IV in plain text (perhaps in 
a message header) and then change it with every message. This would render each 
encrypted message unrecognizable from any previous one—even if their unencryp- 
ted versions were similar or identical. 


If you don’t need—or want—the protection of an IV, you can 
defeat it by using the same 16-byte value for both the key and 
the IV. Sending multiple messages with the same IV, though, 
weakens the cipher and might even make it possible to crack. 


The cryptography work is divided among the classes. Aes is the mathematician; it 
applies the cipher algorithm, along with its encryptor and decryptor transforms. 
CryptoStream is the plumber; it takes care of stream plumbing. You can replace Aes 
with a different symmetric algorithm, yet still use CryptoStream. 


CryptoStrean is bidirectional, meaning you can read or write to the stream depend- 
ing on whether you choose CryptoStreamMode.Read or CryptoStreamMode.Write. 
Both encryptors and decryptors are read and write savvy, yielding four combina- 
tions—the choice can have you staring at a blank screen for a while! It can be help- 
ful to model reading as “pulling” and writing as “pushing” If in doubt, start with 
Write for encryption and Read for decryption; this is often the most natural. 


To generate a random key or IV, use RandomNumberGenerator in System 
.Cryptography. The numbers it produces are genuinely unpredictable, or crypto- 
graphically strong (the System.Random class does not offer the same guarantee). 
Here’s an example: 


byte[] key = new byte [16]; 

byte[] iv = new byte [16]; 

RandomNumberGenerator rand = RandomNumberGenerator.Create(); 
rand.GetBytes (key); 

rand.GetBytes (iv); 


If you don't specify a key and IV, cryptographically strong random values are gener- 
ated automatically. You can query these through the Aes object's Key and IV 
properties. 


Encrypting in Memory 


With a MemoryStream, you can encrypt and decrypt entirely in memory. Here are 
helper methods that do just this, with byte arrays: 


public static byte[] Encrypt (byte[] data, byte[] key, byte[] iv) 
{ 
using (Aes algorithm = Aes.Create()) 
using (ICryptoTransform encryptor = algorithm.CreateEncryptor (key, iv)) 
return Crypt (data, encryptor); 
} 


public static byte[] Decrypt (byte[] data, byte[] key, byte[] iv) 
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{ 


using (Aes algorithm = Aes.Create()) 
using (ICryptoTransform decryptor = algorithm.CreateDecryptor (key, iv)) 


return Crypt (data, decryptor); 
} 


static byte[] Crypt (byte[] data, ICryptoTransform cryptor) 
{ 


MemoryStream m = new MemoryStream(); 
using (Stream c = new CryptoStream (m, cryptor, CryptoStreamMode.Write) ) 


c.Write (data, 0, data.Length); 
return m.ToArray(); 


i 
Here, CryptoStreamMode.Write works best for both encryption and decryption, 
since in both cases we're “pushing” into a fresh memory stream. 
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Here are overloads that accept and return strings: 





public static string Encrypt (string data, byte[] key, byte[] iv) 
{ 


return Convert.ToBase64String ( 
Encrypt (Encoding.UTF8.GetBytes (data), key, iv)); 


} 


public static string Decrypt (string data, byte[] key, byte[] iv) 


{ 
return Encoding.UTF8.GetString ( 


Decrypt (Convert.FromBase64String (data), key, iv)); 
} 


The following demonstrates their use: 


byte[] key = new byte[16]; 
byte[] iv = new byte[16]; 


var cryptoRng = RandomNumberGenerator.Create(); 
cryptoRng.GetBytes (key); 
cryptoRng.GetBytes (iv); 


string encrypted = Encrypt ("Yeah!", key, iv); 
Console.WriteLine (encrypted); // R1/5gYvcxyR2vzP jnT7yaQ== 


string decrypted = Decrypt (encrypted, key, iv); 
Console.WriteLine (decrypted); // Yeah! 
Chaining Encryption Streams 


CryptoStream is a decorator, meaning that you can chain it with other streams. In 
the following example, we write compressed encrypted text to a file and then read it 


back: 


byte[] key = new byte [16]; 
byte[] iv = new byte [16]; 


var cryptoRng = RandomNumberGenerator .Create(); 
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cryptoRng.GetBytes (key); 
cryptoRng.GetBytes (iv); 


using (Aes algorithm = Aes.Create()) 
{ 
using (ICryptoTransform encryptor = algorithm.CreateEncryptor(key, iv)) 
using (Stream f = File.Create ("serious.bin")) 
using (Stream c = new CryptoStream (f, encryptor, CryptoStreamMode.Write) ) 
using (Stream d = new DeflateStream (c, CompressionMode.Compress) ) 
using (StreamWriter w = new StreamWriter (d)) 
await w.WriteLineAsync ("Small and secure!"); 


using (ICryptoTransform decryptor = algorithm.CreateDecryptor(key, iv)) 
using (Stream f = File.OpenRead ("serious.bin")) 
using (Stream c = new CryptoStream (f, decryptor, CryptoStreamMode.Read) ) 
using (Stream d = new DeflateStream (c, CompressionMode.Decompress) ) 
using (StreamReader r = new StreamReader (d)) 

Console.WriteLine (await r.ReadLineAsync()); // Small and secure! 


} 


(As a final touch, we make our program asynchronous by calling WriteLineAsync 
and ReadLineAsync, and awaiting the result.) 


In this example, all one-letter variables form part of a chain. The mathematicians— 
algorithm, encryptor, and decyptor—are there to assist CryptoStream in the 
cipher work, as illustrated in Figure 21-1. 





Object Composition 
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Figure 21-1. Chaining encryption and compression streams 
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Chaining streams in this manner demands little memory, regardless of the ultimate 
stream sizes. 


Disposing Encryption Objects 


Disposing a CryptoStream ensures that its internal cache of data is flushed to the 
underlying stream. Internal caching is necessary for encryption algorithms because 
they process data in blocks, rather than one byte at a time. 


CryptoStream is unusual in that its Flush method does nothing. To flush a stream 
(without disposing it) you must call FlushFinalBlock. In contrast to Flush, you can 
call FlushFinalBlock only once, and then no further data can be written. 


We also disposed the mathematicians—the Aes algorithm and ICryptoTransform 
objects (encryptor and decryptor). When the Rijndael transforms are disposed, 
they wipe the symmetric key and related data from memory, preventing subsequent 
discovery by other software running on the computer (we're talking malware). You 
cant rely on the garbage collector for this job, because it merely flags sections of 
memory as available; it doesn’t write zeros over every byte. 


The easiest way to dispose an Aes object outside of a using statement is to call 
Clear. Its Dispose method is hidden via explicit implementation (to signal its 
unusual disposal semantics, whereby it clears memory rather than releasing unman- 
aged resources). 


You can further reduce your application’s vulnerability to leak- 
ing secrets via released memory by 


¢ Avoiding strings for security information (being immut- 
able, a string’s value can never be cleared once created) 


¢ Overwriting buffers as soon as they’re no longer needed 
(for instance, by calling Array.Clear on a byte array) 


Key Management 


Key management is a critical element of security: if your keys are exposed, so is your 
data. You need to consider who should have access to keys and how to back them up 
in case of hardware failure while storing them in a manner that prevents unauthor- 
ized access. 


It is inadvisable to hardcode encryption keys because popular tools exist to decom- 
pile assemblies with little expertise required. A better option (on Windows) is to 
manufacture a random key for each installation, storing it securely with Windows 
Data Protection. 


For applications deployed to the cloud, Microsoft Azure and Amazon Web Services 
(AWS) offer key-management systems with additional features that can be useful in 
an enterprise environment, such as audit trails. 





Symmetric Encryption | 875 


ie) 
= 
< 
xo] 
er 
fe) 
Co} 
= 
9 
mo) 
= 
< 





If youre encrypting a message stream, public-key encryption provides the best 
option still. 


Public-Key Encryption and Signing 


Public-key cryptography is asymmetric, meaning that encryption and decryption 
use different keys. 


Unlike symmetric encryption, for which any arbitrary series of bytes of appropriate 
length can serve as a key, asymmetric cryptography requires specially crafted key 
pairs. A key pair contains a public key and private key component that work together 
as follows: 


¢ The public key encrypts messages. 
¢ The private key decrypts messages. 


The party “crafting” a key pair keeps the private key secret while distributing the 
public key freely. A special feature of this type of cryptography is that you cannot 
calculate a private key from a public key. So, if the private key is lost, encrypted data 
cannot be recovered; conversely, if a private key is leaked, the encryption system 
becomes useless. 


A public key handshake allows two computers to communicate securely over a pub- 
lic network, with no prior contact and no existing shared secret. To see how this 
works, suppose that computer Origin wants to send a confidential message to com- 
puter Target: 


1. Target generates a public/private key pair and then sends its public key to Ori- 
gin. 


2. Origin encrypts the confidential message using Target’s public key and then 
sends it to Target. 


3. Target decrypts the confidential message using its private key. 
An eavesdropper will see the following: 


¢ Target's public key 


e The secret message, encrypted with Target’s public key 


But without Target’s private key, the message cannot be decrypted. 


This doesn't prevent against a man-in-the-middle attack: in 
other words, Origin cannot know that Target isn’t some mali- 
cious party. To authenticate the recipient, the originator needs 
to already know the recipient’s public key, or be able to vali- 
date its key through a digital site certificate. 
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Because public key encryption is relatively slow and its message size limited, the 
secret message sent from Origin to Target typically contains a fresh key for subse- 
quent symmetric encryption. This allows public key encryption to be abandoned for 
the remainder of the session, in favor of a symmetric algorithm capable of handling 
larger messages. This protocol is particularly secure if a fresh public/private key pair 
is generated for each session because no keys then need to be stored on either 
computer. 


The public key encryption algorithms rely on the message 
being smaller than the key. This makes them suitable for 
encrypting only small amounts of data, such as a key for sub- 
sequent symmetric encryption. If you try to encrypt a message 
much larger than half the key size, the provider will throw an 
exception. 


The RSA Class 


.NET Core provides a number of asymmetric algorithms, of which RSA is the most 
popular. Here’s how to encrypt and decrypt with RSA: 


byte[] data = { 1, 2, 3, 4, 5}; // This ts what we're encrypting. 


using (var rsa = new RSACryptoServiceProvider()) 


{ 
byte[] encrypted = rsa.Encrypt (data, true); 


byte[] decrypted = rsa.Decrypt (encrypted, true); 
} 


Because we didn’t specify a public or private key, the cryptographic provider auto- 
matically generated a key pair, using the default length of 1,024 bits; you can request 
longer keys in increments of 8 bytes, through the constructor. For security-critical 
applications, it’s prudent to request 2,048 bits: 


var rsa = new RSACryptoServiceProvider (2048); 


Generating a key pair is computationally intensive—taking perhaps 100 ms. For this 
reason, the RSA implementation delays this until a key is actually needed, such as 
when calling Encrypt. This gives you the chance to load in an existing key—or key 
pair, should it exist. 


The methods ImportCspBlob and ExportCspBlob load and save keys in byte array 
format. FromXmlString and ToXmlString do the same job in a string format, the 
string containing an XML fragment. A bool flag lets you indicate whether to 
include the private key when saving. Here’s how to manufacture a key pair and save 
it to disk: 


using (var rsa = new RSACryptoServiceProvider()) 

{ 
File.WriteALlText ("PublicKeyOnly.xml", rsa.ToXmlString (false)); 
File.WriteALlText ("PublicPrivate.xml", rsa.ToXmlString (true)); 


} 
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Because we didn’t provide existing keys, ToXmlString forced the manufacture of a 
fresh key pair (on the first call). In the next example, we read back these keys and 
use them to encrypt and decrypt a message: 


byte[] data = Encoding.UTF8.GetBytes ("Message to encrypt"); 


string publickKeyOnly = File.ReadAllText ("PublickKeyOnly. xml"); 
string publicPrivate = File.ReadAllText ("PublicPrivate. xml"); 


byte[] encrypted, decrypted; 


using (var rsaPublicOnly = new RSACryptoServiceProvider()) 
{ 

rsaPublicOnly.FromXmlString (publicKeyOnly) ; 

encrypted = rsaPublicOnly.Encrypt (data, true); 


// The next line would throw an exception because you need the private 
// key in order to decrypt: 
// decrypted = rsaPublicOnly.Decrypt (encrypted, true); 

} 


using (var rsaPublicPrivate = new RSACryptoServiceProvider()) 


{ 
// With the private key we can successfully decrypt: 
rsaPublicPrivate.FromXmlString (publicPrivate) ; 
decrypted = rsaPublicPrivate.Decrypt (encrypted, true); 


} 
Digital Signing 


You also can use public key algorithms to digitally sign messages or documents. A 
signature is like a hash, except that its production requires a private key and so can- 
not be forged. The public key is used to verify the signature. Here’s an example: 


byte[] data = Encoding.UTF8.GetBytes ("Message to sign"); 

byte[] publicKey; 

byte[] signature; 

object hasher = SHA1.Create(); // Our chosen hashing algorithm. 


// Generate a new key pair, then sign the data with it: 
using (var publicPrivate = new RSACryptoServiceProvider()) 
{ 
signature = publicPrivate.SignData (data, hasher); 
publicKey = publicPrivate.ExportCspBlob (false); // get public key 


} 


// Create a fresh RSA using just the public key, then test the signature. 
using (var publicOnly = new RSACryptoServiceProvider()) 
{ 

publicOnly.ImportCspBlob (publicKey) ; 

Console.Write (publicOnly.VerifyData (data, hasher, signature)); // True 


// Let's now tamper with the data, and recheck the signature: 
data[0] = 0; 
Console.Write (publicOnly.VerifyData (data, hasher, signature)); // False 
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// The following throws an exception as we're lacking a private key: 
signature = publicOnly.SignData (data, hasher); 


} 


Signing works by first hashing the data, and then applying the asymmetric algo- 
rithm to the resultant hash. Because hashes are of a small fixed size, large docu- 
ments can be signed relatively quickly (public key encryption is much more CPU- 
intensive than hashing). If you want, you can do the hashing yourself and then call 
SignHash instead of SignData: 


using (var rsa = new RSACryptoServiceProvider()) 


{ 
byte[] hash = SHA1.Create().ComputeHash (data); 


signature = rsa.SignHash (hash, CryptoConfig.MapNameToOID ("SHA1")); 


an 


SignHash still needs to know what hash algorithm you used; CryptoConfig.Map 
NameToOID provides this information in the correct format from a friendly name 
such as “SHA1.” 


RSACryptoServiceProvider produces signatures whose size matches that of the key. 
Currently, no mainstream algorithm produces secure signatures significantly 
smaller than 128 bytes (suitable for product activation codes, for instance). 


For signing to be effective, the recipient must know, and trust, 
the sender’s public key. This can happen via prior communica- 
tion, preconfiguration, or a site certificate. A site certificate is 
an electronic record of the originator’s public key and name— 
itself signed by an independent trusted authority. The 
namespace System.Security.Cryptography.X509Certifi 
cates defines the types for working with certificates. 
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22 


Advanced Threading 








We started Chapter 14 with the basics of threading as a precursor to tasks and asyn- 
chrony. Specifically, we showed how to start and configure a thread, and covered 
essential concepts such as thread pooling, blocking, spinning, and synchronization 
contexts. We also introduced locking and thread safety, and demonstrated the sim- 
plest signaling construct, ManualResetEvent. 


This chapter picks up where Chapter 14 left off on the topic of threading. In the first 
three sections, we flesh out synchronization, locking, and thread safety in greater 
detail. We then cover: 

e Nonexclusive locking (Semaphore and reader/writer locks) 


e All of the signaling constructs (AutoResetEvent, ManualResetEvent, 
CountdownEvent, and Barrier) 


¢ Lazy initialization (Lazy<T> and LazyInitializer) 


¢ Thread-local storage (ThreadStaticAttribute, ThreadLocal<T>, and GetData/ 
SetData) 


¢ Timers 
Threading is such a vast topic that we've put additional material online to complete 
the picture. Go online for a discussion on the following, more arcane, topics: 


¢ Monitor .Wait and Monitor .Pulse for specialized signaling scenarios 


e Nonblocking synchronization techniques for micro-optimization 
(Interlocked, memory barriers, volatile) 


e SpinLock and SpinWait for high-concurrency scenarios 
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Synchronization Overview 


Synchronization is the act of coordinating concurrent actions for a predictable out- 
come. Synchronization is particularly important when multiple threads access the 
same data; it’s surprisingly easy to run aground in this area. 


The simplest and most useful synchronization tools are arguably the continuations 
and task combinators described in Chapter 14. By formulating concurrent programs 
into asynchronous operations strung together with continuations and combinators, 
you lessen the need for locking and signaling. However, there are still times when 
the lower-level constructs come into play. 


The synchronization constructs can be divided into three categories: 


Exclusive locking 
Exclusive locking constructs allow just one thread to perform some activity or 
execute a section of code at a time. Their primary purpose is to let threads 
access shared writing state without interfering with one another. The exclusive 
locking constructs are Lock, Mutex, and SpinLock. 


Nonexclusive locking 
Nonexclusive locking lets you limit concurrency. The nonexclusive locking 
constructs are Semaphore(Slim) and ReaderWriterLock(Slim). 


Signaling 
These allow a thread to block until receiving one or more notifications from 
other thread(s). The signaling constructs include ManualResetEvent(Slim), 
AutoResetEvent, CountdownEvent, and Barrier. The former three are referred 
to as event wait handles. 


It's also possible (and tricky) to perform certain concurrent operations on shared 
state without locking through the use of nonblocking synchronization constructs. 
These are Thread.MemoryBarrier, Thread.VolatileRead, Thread.VolatileWrite, 
the volatile keyword, and the Interlocked class. We cover this topic online, along 
with Monitor’s Wait/Pulse methods, which you can use to write custom signaling 
logic. 


Exclusive Locking 


There are three exclusive locking constructs: the lock statement, Mutex, and 
SpinLock. The lock construct is the most convenient and widely used, whereas the 
other two target niche scenarios: 


e Mutex lets you span multiple processes (computer-wide locks). 


e SpinLock implements a micro-optimization that can lessen context switches in 
high-concurrency scenarios (see http://albahari.com/threading/). 
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The lock Statement 


To illustrate the need for locking, consider the following class: 


class ThreadUnsafe 


i 


static int _vali = 1, _val2 = 1; 


static void Go() 


{ 
if (_val2 != 0) Console.WriteLine (_val1 / _val2); 


_val2 = 0; 
t 
i 
This class is not thread-safe: if Go were called by two threads simultaneously, it 
would be possible to get a division-by-zero error because _val2 could be set to zero 
in one thread right as the other thread was in between executing the if statement 
and Console.WriteLine. Here’s how lock fixes the problem: 


class ThreadSafe 


{ 


static readonly object _locker = new object(); 
static int _val1 = 1, _val2 = 1; 
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static void Go() 


lock (_locker) 


{ 
if (_val2 != ©) Console.WriteLine (_val1i / _val2); 


_val2 = Q; 
} 
i 
I 
Only one thread can lock the synchronizing object (in this case, _locker) at a time, 
and any contending threads are blocked until the lock is released. If more than one 
thread contends the lock, they are queued on a “ready queue” and granted the lock 
on a first-come, first-served basis.' Exclusive locks are sometimes said to enforce 
serialized access to whatever's protected by the lock because one thread’s access can- 
not overlap with that of another. In this case, we're protecting the logic inside the Go 
method as well as the fields _val1 and _val2. 





1 Nuances in the behavior of Windows and the CLR mean that the fairness of the queue can some- 


times be violated. 
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Monitor.Enter and Monitor.Exit 


C#’s lock statement is in fact a syntactic shortcut for a call to the methods 
Monitor.Enter and Monitor.Exit, with a try/finally block. Here's (a simplified 
version of) what’s actually happening within the Go method of the preceding 
example: 


Monitor.Enter (_locker); 


try 

{ 
if (_val2 != ©) Console.WriteLine (_vali / _val2); 
_val2 = 0; 

} 


finally { Monitor.Exit (_locker); } 


Calling Monitor.Exit without first calling Monitor.Enter on the same object 
throws an exception. 


The lockTaken overloads 


The code that we just demonstrated has a subtle vulnerability. Consider the 
(unlikely) event of an exception being thrown between the call to Monitor. Enter 
and the try block (due, perhaps, to an OutOfMemoryException or, in .NET Frame- 
work, if the thread is aborted). In such a scenario, the lock might or might not be 
taken. If the lock is taken, it won't be released—because we'll never enter the try/ 
finally block. This will result in a leaked lock. To avoid this danger, 
Monitor .Enter defines the following overload: 


public static void Enter (object obj, ref bool lockTaken) ; 


lockTaken is false after this method if (and only if) the Enter method throws an 
exception and the lock was not taken. 


Here's the more robust pattern of use (which is exactly how C# translates a Lock 
statement): 


bool lockTaken = false; 

try 

{ 
Monitor.Enter (_locker, ref lockTaken) ; 
// Do your stuff... 


} 
finally { if (lockTaken) Monitor.Exit (_locker); } 


TryEnter 


Monitor also provides a TryEnter method that allows a timeout to be specified, 
either in milliseconds or as a TimeSpan. The method then returns true if a lock was 
obtained, or false if no lock was obtained because the method timed out. TryEnter 
can also be called with no argument, which “tests” the lock, timing out immediately 
if the lock can't be obtained immediately. As with the Enter method, TryEnter is 
overloaded to accept a LockTaken argument. 
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Choosing the Synchronization Object 


You can use any object visible to each of the partaking threads as a synchronizing 
object, subject to one hard rule: it must be a reference type. The synchronizing 
object is typically private (because this helps to encapsulate the locking logic) and is 
typically an instance or static field. The synchronizing object can double as the 
object it’s protecting, as the _list field does in the following example: 


class ThreadSafe 
{ 


List <string> _list = new List <string>(); 


void Test() 


{ 
lock (_list) 


_list.Add ("Item 1"); 


A field dedicated for the purpose of locking (such as _locker, in the example prior) 
allows precise control over the scope and granularity of the lock. You also can use 
the containing object (this) as a synchronization object: 


lock (this) { ... } 
Or even its type: 
lock (typeof (Widget)) { ... } // For protecting access to statics 


The disadvantage of locking in this way is that you're not encapsulating the locking 
logic, so it becomes more difficult to prevent deadlocking and excessive blocking. 


You can also lock on local variables captured by lambda expressions or anonymous 
methods. 


Locking doesn’t restrict access to the synchronizing object 
itself in any way. In other words, x.ToString() will not block 
because another thread has called Lock(x); both threads must 
call Lock(x) in order for blocking to occur. 


When to Lock 


As a basic rule, you need to lock around accessing any writable shared field. Even in 
the simplest case—an assignment operation on a single field—you must consider 
synchronization. In the following class, neither the Increment nor the Assign 
method is thread-safe: 


class ThreadUnsafe 
{ 
static int _x; 
static void Increment() { _x++; } 
static void Assign() { x S°1235} 
} 
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Here are thread-safe versions of Increment and Assign: 


static readonly object _locker = new object(); 
static int _x; 


static void Increment() { lock (_locker) _x++; } 
static void Assign() { lock (_locker) _x = 123; } 


Without locks, two problems can arise: 


¢ Operations such as incrementing a variable (or even reading/writing a variable, 
under certain conditions) are not atomic. 


e The compiler, CLR, and processor are entitled to reorder instructions and 
cache variables in CPU registers to improve performance—as long as such 
optimizations don’t change the behavior of a single-threaded program (or a 
multithreaded program that uses locks). 


Locking mitigates the second problem because it creates a memory barrier before 
and after the lock. A memory barrier is a “fence” through which the effects of reor- 
dering and caching cannot penetrate. 


This applies not just to locks, but to all synchronization con- 
structs. So, if your use of a signaling construct, for instance, 
ensures that just one thread reads/writes a variable at a time, 
you don't need to lock. Hence, the following code is thread- 
safe without locking around x: 


var signal = new ManualResetEvent (false); 

int x = 0; 

new Thread (() => { x++; signal.Set(); }).Start(); 

signal.WaitOne(); 

Console.WriteLine (x); // 1 (always) 
In “Nonblocking Synchronization’, we explain how this need arises and how the 
memory barriers and the Interlocked class can provide alternatives to locking in 
these situations. 


Locking and Atomicity 


If a group of variables are always read and written within the same lock, you can say 
that the variables are read and written atomically. Let's suppose that fields x and y 
are always read and assigned within a Lock on object Locker: 


lock (locker) { if (x != 0) y /= x; } 


We can say that x and y are accessed atomically because the code block cannot be 
divided or preempted by the actions of another thread in such a way that it will 
change x or y and invalidate its outcome. You'll never get a division-by-zero error, 
provided that x and y are always accessed within this same exclusive lock. 
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The atomicity provided by a lock is violated if an exception is 
thrown within a lock block (whether or not multithreading is 
involved). For example, consider the following: 


decimal _savingsBalance, _checkBalance; 


void Transfer (decimal amount) 


{ 
lock (_locker) 
{ 
_SavingsBalance += amount; 
_checkBalance -= amount + GetBankFee(); 
} 
} 


If an exception were thrown by GetBankFee(), the bank 
would lose money. In this case, we could avoid the problem by 
calling GetBankFee earlier. A solution for more complex cases 
is to implement “rollback” logic within a catch or finally 
block. 


Instruction atomicity is a different, albeit analogous, concept: an instruction is 
atomic if it executes indivisibly on the underlying processor. 
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Nested Locking 


A thread can repeatedly lock the same object in a nested (reentrant) fashion: 





lock (locker) 
lock (locker) 
lock (locker) 


{ 
// Do something... 
} 
Alternatively: 


Monitor.Enter (locker); Monitor.Enter (locker); Monitor.Enter (locker); 


// Do something... 
Monitor.Exit (locker); Monitor.Exit (locker); Monitor.Exit (locker); 


In these scenarios, the object is unlocked only when the outermost lock statement 
has exited—or a matching number of Monitor .Exit statements have executed. 


Nested locking is useful when one method calls another from within a lock: 


static readonly object _locker = new object(); 
static void Main() 


lock (_locker) 
{ 
AnotherMethod(); 
// We still have the lock - because locks are reentrant. 
} 
} 


static void AnotherMethod() 
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{ 
lock (_locker) { Console.WriteLine ("Another method"); } 


} 
A thread can block on only the first (outermost) lock. 


Deadlocks 


A deadlock happens when two threads each wait for a resource held by the other, so 
neither can proceed. The easiest way to illustrate this is with two locks: 


object locker1 
object locker2 


new object(); 
new object(); 


new Thread (() => { 
lock (locker1) 


{ 
Thread.Sleep (1000); 
lock (locker2); // Deadlock 


} 
}).StartQ); 
lock (locker2) 


{ 
Thread.Sleep (1000); 
lock (locker1); // Deadlock 


} 


You can create more elaborate deadlocking chains with three or more threads. 


The CLR, in a standard hosting environment, is not like SQL 
Server and does not automatically detect and resolve dead- 
locks by terminating one of the offenders. A threading dead- 
lock causes participating threads to block indefinitely, unless 
you've specified a locking timeout. (Under the SQL CLR inte- 
gration host, however, deadlocks are automatically detected 
and a [catchable] exception is thrown on one of the threads.) 


Deadlocking is one of the most difficult problems in multithreading—especially 
when there are many interrelated objects. Fundamentally, the hard problem is that 
you can’t be sure what locks your caller has taken out. 


So, you might lock private field a within your class x, unaware that your caller (or 
caller’s caller) has already locked field b within class y. Meanwhile, another thread is 
doing the reverse—creating a deadlock. Ironically, the problem is exacerbated by 
(good) object-oriented design patterns, because such patterns create call chains that 
are not determined until runtime. 


The popular advice, “lock objects in a consistent order to avoid deadlocks,” although 
helpful in our initial example, is difficult to apply to the scenario just described. A 
better strategy is to be wary of locking around calls to methods in objects that might 
have references back to your own object. Also, consider whether you really need to 
lock around calls to methods in other classes (often you do—as you'll see in “Lock- 
ing and Thread Safety” on page 890—but sometimes there are other options). 
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Relying more on higher-level synchronization options such as task continuations/ 
combinators, data parallelism, and immutable types (later in this chapter) can lessen 
the need for locking. 


Here is an alternative way to perceive the problem: when you 
call out to other code while holding a lock, the encapsulation 
of that lock subtly leaks. This is not a fault in the CLR; it’s a 
fundamental limitation of locking in general. The problems of 
locking are being addressed in various research projects, 
including Software Transactional Memory. 


Another deadlocking scenario arises when calling Dispatcher .Invoke (in a WPF 
application) or Control. Invoke (in a Windows Forms application) while in posses- 
sion of a lock. If the user interface happens to be running another method that’s 
waiting on the same lock, a deadlock will happen right there. You often can fix this 
simply by calling BeginInvoke instead of Invoke (or relying on asynchronous func- 
tions, which do this implicitly when a synchronization context is present). Alterna- 
tively, you can release your lock before calling Invoke, although this won't work if 
your caller took out the lock. 


Performance 


Locking is fast: you can expect to acquire and release a lock in less than 20 nanosec- 
onds on a 2020-era computer if the lock is uncontended. If it is contended, the con- 
sequential context switch moves the overhead closer to the microsecond region, 
although it can be longer before the thread is actually rescheduled. 


Mutex 


A Mutex is like a C# Lock, but it can work across multiple processes. In other words, 
Mutex can be computer-wide as well as application-wide. Acquiring and releasing an 
uncontended Mutex takes around half a microsecond—more than 20 times slower 
than a Lock. 


With a Mutex class, you call the WaitOne method to lock and ReleaseMutex to 
unlock. Just as with the Lock statement, a Mutex can be released only from the same 
thread that obtained it. 


If you forget to call ReleaseMutex and simply call Close or 
Dispose, an AbandonedMutexException will be thrown upon 
anyone else waiting upon that mutex. 


A common use for a cross-process Mutex is to ensure that only one instance of a 
program can run at a time. Here's how it’s done: 


class OneAtATimePlease 


{ 


static void Main() 


{ 


// Naming a Mutex makes it available computer-wide. Use a name that's 
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// unique to your company and application (e.g., include your URL). 


using var mutex = new Mutex (true, @"Global\oreilly.com OneAtATimeDemo") ; 
// Wait a few seconds if contended, in case another instance 
// of the program is still in the process of shutting down. 


if (!mutex.WaitOne (TimeSpan.FromSeconds (3), false)) 
{ 


Console.WriteLine ("Another instance of the app is running. Bye!"); 
return; 


} 
try { RunProgram(); } 
finally { mutex.ReleaseMutex (); } 


} 


static void RunProgram() 


{ 


Console.WriteLine ("Running. Press Enter to exit"); 
Console.ReadLine(); 
} 
} 


If you're running under Terminal Services or in separate Unix 
consoles, a computer-wide Mutex is ordinarily visible only to 
applications in the same session. To make it visible to all ter- 
minal server sessions, prefix its name with Global\, as shown 
in the example. 


Locking and Thread Safety 


A program or method is thread-safe if it can work correctly in any multithreading 
scenario. Thread safety is achieved primarily with locking and by reducing the pos- 
sibilities for thread interaction. 


General-purpose types are rarely thread-safe in their entirety, for the following 
reasons: 


¢ The development burden in full thread safety can be significant, particularly if 
a type has many fields (each field is a potential for interaction in an arbitrarily 
multithreaded context). 


¢ Thread safety can entail a performance cost (payable, in part, whether or not 
the type is actually used by multiple threads). 


e A thread-safe type does not necessarily make the program using it thread-safe, 
and often the work involved in the latter makes the former redundant. 


Thread safety is thus usually implemented just where it needs to be in order to han- 
dle a specific multithreading scenario. 


There are, however, a few ways to “cheat” and have large and complex classes run 
safely in a multithreaded environment. One is to sacrifice granularity by wrapping 
large sections of code—even access to an entire object—within a single exclusive 
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lock, enforcing serialized access at a high level. This tactic is, in fact, essential if you 
want to use thread-unsafe third-party code (or most .NET Core types, for that mat- 
ter) in a multithreaded context. The trick is simply to use the same exclusive lock to 
protect access to all properties, methods, and fields on the thread-unsafe object. The 
solution works well if the object’s methods all execute quickly (otherwise, there will 
be a lot of blocking). 


Primitive types aside, few .NET Core types, when instantiated, 
are thread-safe for anything more than concurrent read-only 
access. The onus is on the developer to superimpose thread 
safety, typically with exclusive locks. (The collections in 
System.Collections.Concurrent that we cover in Chap- 
ter 23 are an exception.) 


Another way to cheat is to minimize thread interaction by minimizing shared data. 
This is an excellent approach and is used implicitly in “stateless” middle-tier appli- 
cation and web-page servers. Because multiple client requests can arrive simultane- 
ously, the server methods they call must be thread-safe. A stateless design (popular 
for reasons of scalability) intrinsically limits the possibility of interaction because 
classes do not save data between requests. Thread interaction is then limited just to 
the static fields that you might choose to create, for such purposes as caching com- 
monly used data in memory and in providing infrastructure services such as 
authentication and auditing. 


Yet another solution (in rich-client applications) is to run code that accesses shared 
state on the UI thread. As we saw in Chapter 14, asynchronous functions make this 
easy. 


Thread Safety and .NET Core Types 


You can use locking to convert thread-unsafe code into thread-safe code. A good 
application of this is NET Core: nearly all of its nonprimitive types are not thread- 
safe (for anything more than read-only access) when instantiated, and yet you can 
use them in multithreaded code if all access to any given object is protected via a 
lock. Here’s an example in which two threads simultaneously add an item to the 
same List collection and then enumerate the list: 


class ThreadSafe 
{ 


static List <string> _list = new List <string>(); 


static void Main() 


{ 
new Thread (AddItem) .Start(); 


new Thread (AddItem) .Start(); 
} 


static void AddItem() 


{ 
lock (_list) _list.Add ("Item " + _list.Count); 
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string[] items; 
lock (_list) items = _list.ToArray(); 
foreach (string s in items) Console.WriteLine (s); 
} 
} 


In this case, we're locking on the _list object itself. If we had two interrelated lists, 
we would need to choose a common object upon which to lock (we could nominate 
one of the lists, or better: use an independent field). 


Enumerating .NET collections is also thread-unsafe in the sense that an exception is 
thrown if the list is modified during enumeration. Rather than locking for the dura- 
tion of enumeration, in this example, we first copy the items to an array. This avoids 
holding the lock excessively if what we're doing during enumeration is potentially 
time-consuming. (Another solution is to use a reader/writer lock; see “Reader/ 
Writer Locks” on page 898.) 


Locking around thread-safe objects 


Sometimes, you also need to lock around accessing thread-safe objects. To illustrate, 
imagine that .NET Core’s List class was, indeed, thread-safe, and we want to add an 
item to a list: 


if (!_list.Contains (newItem)) _list.Add (newItem); 


Regardless of whether the list was thread-safe, this statement is certainly not! The 
whole if statement would need to be wrapped in a lock in order to prevent preemp- 
tion in between testing for containership and adding the new item. This same lock 
would then need to be used everywhere we modified that list. For instance, the fol- 
lowing statement would also need to be wrapped in the identical lock to ensure that 
it did not preempt the former statement: 


_list.Clear(); 


In other words, we would need to lock exactly as with our thread-unsafe collection 
classes (making the List class’s hypothetical thread safety redundant). 


Locking around accessing a collection can cause excessive 
blocking in highly concurrent environments. To this 
end, .NET Core provides a thread-safe queue, stack, and dic- 
tionary, which we discuss in Chapter 23. 


Static members 


Wrapping access to an object around a custom lock works only if all concurrent 
threads are aware of—and use—the lock. This might not be the case if the object is 
widely scoped. The worst case is with static members in a public type. For instance, 
imagine if the static property on the DateTime struct, DateTime.Now, was not 
thread-safe, and that two concurrent calls could result in garbled output or an 
exception. The only way to remedy this with external locking might be to lock the 
type itself—lock(typeof (DateTime) )—before calling DateTime.Now. This would 
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work only if all programmers agreed to do this (which is unlikely). Furthermore, 
locking a type creates problems of its own. 


For this reason, static members on the DateTime struct have been carefully pro- 
grammed to be thread-safe. This is a common pattern throughout .NET Core: static 
members are thread-safe; instance members are not. Following this pattern also 
makes sense when writing types for public consumption, so as not to create impos- 
sible thread-safety conundrums. In other words, by making static methods thread- 
safe, youre programming so as not to preclude thread safety for consumers of that 


type. 


Thread safety in static methods is something that you must 
explicitly code: it doesn’t happen automatically by virtue of the 
method being static! 


Read-only thread safety 


Making types thread-safe for concurrent read-only access (where possible) is advan- 
tageous because it means that consumers can avoid excessive locking. Many .NET 
Core types follow this principle: collections, for instance, are thread-safe for concur- 
rent readers. 


Following this principle yourself is simple: if you document a type as being thread- 
safe for concurrent read-only access, don't write to fields within methods that a con- 
sumer would expect to be read-only (or lock around doing so). For instance, in 
implementing a ToArray() method in a collection, you might begin by compacting 
the collection’s internal structure. However, this would make it thread-unsafe for 
consumers that expected this to be read-only. 


Read-only thread safety is one of the reasons that enumerators are separate from 
enumerables: two threads can simultaneously enumerate over a collection because 
each gets a separate enumerator object. 


In the absence of documentation, it pays to be cautious in 
assuming whether a method is read-only in nature. A good 
example is the Random class: when you call Random.Next(), its 
internal implementation requires that it update private seed 
values. Therefore, you must either lock around using the 
Randon class, or maintain a separate instance per thread. 


Thread Safety in Application Servers 


Application servers need to be multithreaded to handle simultaneous client 
requests. ASP.NET Core and Web API applications are implicitly multithreaded. 
This means that when writing code on the server side, you must consider thread 
safety if there’s any possibility of interaction among the threads processing client 
requests. Fortunately, such a possibility is rare; a typical server class is either state- 
less (no fields) or has an activation model that creates a separate object instance for 
each client or each request. Interaction usually arises only through static fields, 
sometimes used for caching in memory parts of a database to improve performance. 
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For example, suppose that you have a RetrieveUser method that queries a database: 


// User is a custom class with fields for user data 
internal User RetrieveUser (int id) { ... } 


If this method were called frequently, you could improve performance by caching 
the results in a static Dictionary. Here's a conceptually simple solution that takes 
thread safety into account: 


static class UserCache 


{ 
static Dictionary <int, User> _users = new Dictionary <int, User>(); 
internal static User GetUser (int id) 
{ 
User u = null; 
lock (_users) 
if (_users.TryGetValue (id, out u)) 
return u; 
u = RetrieveUser (id); // Method to retrieve from database; 
lock (_users) _users [id] = u; 
return u; 
} 
} 


We must, at a minimum, lock around reading and updating the dictionary to ensure 
thread safety. In this example, we choose a practical compromise between simplicity 
and performance in locking. Our design creates a small potential for inefficiency: if 
two threads simultaneously called this method with the same previously unretrieved 
id, the RetrieveUser method would be called twice—and the dictionary would be 
updated unnecessarily. Locking once across the whole method would prevent this, 
but it would create a worse inefficiency: the entire cache would be locked up for the 
duration of calling RetrieveUser, during which time other threads would be 
blocked in retrieving any user. 


For an ideal solution, we need to use the strategy we described in “Completing syn- 
chronously” on page 621 in Chapter 14. Instead of caching User, we cache 
Task<User>, which the caller then awaits: 


static class UserCache 
{ 
static Dictionary <int, Task<User>> _userTasks = 
new Dictionary <int, Task<User>>(); 


internal static Task<User> GetUserAsync (int id) 
{ 
lock (_userTasks) 
if (_userTasks.TryGetValue (id, out var userTask)) 
return userTask; 
else 
return _userTasks [id] = Task.Run (() => RetrieveUser (id)); 
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ii 
i 
Notice that we now have a single lock that covers the entire method's logic. We can 
do this without hurting concurrency because all we're doing inside the lock is 
accessing the dictionary and (potentially) initiating an asynchronous operation (by 
calling Task.Run). Should two threads call this method at the same time with the 
same ID, they'll both end up awaiting the same task, which is exactly the outcome 
we want. 


Immutable Objects 


An immutable object is one whose state cannot be altered—externally or internally. 
The fields in an immutable object are typically declared read-only and are fully ini- 
tialized during construction. 


Immutability is a hallmark of functional programming—where instead of mutating 
an object, you create a new object with different properties. LINQ follows this para- 
digm. Immutability is also valuable in multithreading in that it avoids the problem 
of shared writable state—by eliminating (or minimizing) the writable. 


One pattern is to use immutable objects to encapsulate a group of related fields, to 
minimize lock durations. To take a very simple example, suppose that we had two 
fields, as follows: 


int _percentComplete; 
string _statusMessage; 


Now let’s assume that we want to read and write them atomically. Rather than lock- 
ing around these fields, we could define the following immutable class: 


class ProgressStatus // Represents progress of some activity 


{ 


public readonly int PercentComplete; 
public readonly string StatusMessage; 


// This class might have many more fields... 


public ProgressStatus (int percentComplete, string statusMessage) 


{ 


PercentComplete = percentComplete; 
StatusMessage = statusMessage; 


} 
} 


Then, we could define a single field of that type, along with a locking object: 


readonly object _statusLocker = new object(); 
ProgressStatus _status; 


We can now read and write values of that type without holding a lock for more than 
a single assignment: 


var status = new ProgressStatus (50, "Working on it"); 
// Imagine we were assigning many more fields... 
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lock (_statusLocker) _status = status; // Very brief lock 


To read the object, we first obtain a copy of the object reference (within a lock). 
Then, we can read its values without needing to hold on to the lock: 


ProgressStatus status; 

lock (_statusLocker) status = _status; // Again, a brief lock 
int pc = status.PercentComplete; 

string msg = status.StatusMessage; 


Nonexclusive Locking 


The nonexclusive locking constructs serve to limit concurrency. In this section, we 
cover semaphores and read/writer locks, and also illustrate how the SemaphoreSlim 
class can limit concurrency with asynchronous operations. 


Semaphore 


A semaphore is like a nightclub: it has a certain capacity, enforced by a bouncer. 
When the club is full, no more people can enter, and a queue builds up outside. 
Then, for each person who leaves, one person enters. The constructor requires a 
minimum of two arguments: the number of places currently available in the night- 
club and the club's total capacity. 


A semaphore with a capacity of one is similar to a Mutex or lock, except that the 
semaphore has no “owner”—it’s thread agnostic. Any thread can call Release on a 
Semaphore, whereas with Mutex and lock, only the thread that obtained the lock can 
release it. 


There are two functionally similar versions of this class: 
Semaphore and SemaphoreSlLin. The latter has been optimized 
to meet the low-latency demands of parallel programming. It’s 
also useful in traditional multithreading because it lets you 
specify a cancellation token when waiting (see “Cancellation” 
on page 625 in Chapter 14), and it exposes a WaitAsync 
method for asynchronous programming. You cannot use it, 
however, for interprocess signaling. 


Semaphore incurs about one microsecond in calling WaitOne 
and Release; SemaphoreSlinm incurs about one-tenth of that. 


Semaphores can be useful in limiting concurrency—preventing too many threads 
from executing a particular piece of code at once. In the following example, five 
threads try to enter a nightclub that allows only three threads in at once: 


class TheClub // No door lists! 
{ 


static SemaphoreSlim _sem = new SemaphoreSlim (3); // Capacity of 3 


static void Main() 
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{ 


for (int i = 1; i <= 5; i++) new Thread (Enter).Start (i); 


} 


static void Enter (object id) 


{ 


Console.WriteLine (id + 
_sem.Wait(); 


wants to enter"); 


Console.WriteLine (id + " is in!"); // Only three threads 
Thread.Sleep (1000 * (int) id); // can be here at 
Console.WriteLine (id + " is leaving"); // a time. 


_sem.Release(); 


} 


Ww 


wants to enter 
is in! 

wants to enter 
is in! 

wants to enter 
a5 ‘in! 

wants to enter 
wants to enter 
is leaving 

is in! 

is leaving 

is in! 


ONPRPUOBWWNN PB 


A Semaphore, if named, can span processes in the same way as a Mutex (named 
Semaphores are available only on Windows, whereas named Mutex also works on 
Unix platforms). 


Asynchronous semaphores and locks 
It is illegal to lock across an await statement: 


lock (_locker) 


{ 
await Task.Delay (1000); // Compilation error 


a 


Doing so would make no sense, because locks are held by a thread, which typically 
changes when returning from an await. Locking also blocks, and blocking for a 
potentially long period of time is exactly what you're not trying to achieve with 
asynchronous functions. 


It’s still sometimes desirable, however, to make asynchronous operations execute 
sequentially—or limit the parallelism such that not more than n operations execute 
at once. For example, consider a web browser: it needs to perform asynchronous 
downloads in parallel, but it might want to impose a limit such that a maximum of 
10 downloads happen at a time. We can achieve this by using a SemaphoreSlim: 
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SemaphoreSlim _semaphore = new SemaphoreSlim (10); 


async Task<byte[]> DownloadWithSemaphoreAsync (string uri) 
{ 


await _semaphore.WaitAsync(); 
try { return await new WebClient().DownloadDataTaskAsync (uri); } 
finally { _semaphore.Release(); } 


} 


Reducing the semaphore’s initialCount to 1 reduces the maximum parallelism to 
1, turning this into an asynchronous lock. 


Writing an EnterAsync extension method 


The following extension method simplifies the asynchronous use of SemaphoreSlim 
by using the Disposable class that we wrote in “Anonymous Disposal” on page 527 
in Chapter 12: 


public static async Task<IDisposable> EnterAsync (this SemaphoreSlim ss) 


{ 


await ss.WaitAsync().ConfigureAwait (false); 
return Disposable.Create (() => ss.Release()); 


} 


With this method, we can rewrite our DownloadWithSemaphoreAsync method as 
follows: 


async Task<byte[]> DownloadWithSemaphoreAsync (string uri) 
{ 


using (await _semaphore.EnterAsync()) 
return await new WebClient().DownloadDataTaskAsync (uri); 


} 
Reader/Writer Locks 


Quite often, instances of a type are thread-safe for concurrent read operations, but 
not for concurrent updates (nor for a concurrent read and update). This can also be 
true with resources such as a file. Although protecting instances of such types with a 
simple exclusive lock for all modes of access usually does the trick, it can unreason- 
ably restrict concurrency if there are many readers and just occasional updates. An 
example of where this could occur is in a business application server, for which 
commonly used data is cached for fast retrieval in static fields. The ReaderWriter 
LockSlim class is designed to provide maximum-availability locking in just this 
scenario. 
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ReaderWriterLockSlim is a replacement for the older “fat” 
ReaderWriterLock class. The latter is similar in functionality, 
but it is several times slower and has an inherent design fault 
in its mechanism for handling lock upgrades. 


When compared to an ordinary lock (Monitor .Enter/Exit), 
ReaderWriterLockSlim is still twice as slow, though. The 
trade-off is less contention (when there's a lot of reading and 
minimal writing). 


With both classes, there are two basic kinds of lock—a read lock and a write lock: 


e A write lock is universally exclusive. 


e Aread lock is compatible with other read locks. 


So, a thread holding a write lock blocks all other threads trying to obtain a read or 
write lock (and vice versa). But if no thread holds a write lock, any number of 
threads may concurrently obtain a read lock. 


ReaderWriterLockSlim defines the following methods for obtaining and releasing 
read/write locks: 


public void EnterReadLock(); 
public void ExitReadLock(); 

public void EnterWriteLock(); 
public void ExitWriteLock(); 


Additionally, there are “Try” versions of all Enter XXX methods that accept timeout 
arguments in the style of Monitor .TryEnter (timeouts can occur quite easily if the 
resource is heavily contended). ReaderWriterLock provides similar methods, 
named AcquireXXX and ReleaseXXX. These throw an ApplicationException if a 
timeout occurs, rather than returning false. 


The following program demonstrates ReaderWriterLockSlim. Three threads con- 
tinually enumerate a list, while two further threads append a random number to the 
list every 100 ms. A read lock protects the list readers, and a write lock protects the 
list writers: 


class SlimDemo 

{ 
static ReaderWriterLockSlim _rw = new ReaderWriterLockSlim(); 
static List<int> _items = new List<int>(); 
static Random _rand = new Random(); 


static void Main() 

{ 
new Thread (Read).Start(); 
new Thread (Read).Start(); 
new Thread (Read).Start(); 


new Thread (Write).Start ("A"); 
new Thread (Write).Start ("B"); 
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} 


static void Read() 


while (true) 
{ 
_rw.EnterReadLock(); 
foreach (int i in _items) Thread.Sleep (10); 
_tw.ExitReadLock(); 
} 
} 


static void Write (object threadID) 
{ 
while (true) 
{ 
int newNumber = GetRandNum (100); 
_rw.EnterWriteLock(); 
_items.Add (newNumber) ; 
_fw.ExitWriteLock(); 
Console.WriteLine ("Thread " + threadID + " added " + newNumber); 
Thread.Sleep (100); 
} 
} 


static int GetRandNum (int max) { lock (_rand) return _rand.Next(max); } 


In production code, youd typically add try/finally blocks to 
ensure that locks were released if an exception were thrown. 


Here’s the result: 


Thread B added 61 
Thread A added 83 
Thread B added 55 
Thread A added 33 


ReaderWriterLockSlim allows more concurrent Read activity than a simple lock. 
We can illustrate this by inserting the following line in the Write method, at the 
start of the while loop: 


Console.WriteLine (_rw.CurrentReadCount + 


concurrent readers"); 


This nearly always prints “3 concurrent readers” (the Read methods spend most of 
their time inside the foreach loops). As well as CurrentReadCount, ReaderwWriter 
LockSlim provides the following properties for monitoring locks: 


public bool IsReadLockHeld { get; } 
public bool IsUpgradeableReadLockHeld { get; } 
public bool IsWriteLockHeld { get; } 
public int WaitingReadCount { get; } 
public int WaitingUpgradeCount { get; } 
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public int WaitingWriteCount { get; } 


public int RecursiveReadCount { get; } 
public int RecursiveUpgradeCount { get; } 
public int RecursiveWriteCount { get; } 


Upgradeable locks 


Sometimes, it’s useful to swap a read lock for a write lock in a single atomic opera- 
tion. For instance, suppose that you want to add an item to a list only if the item 
wasn't already present. Ideally, youd want to minimize the time spent holding the 
(exclusive) write lock, so you might proceed as follows: 


1. Obtain a read lock. 


2. Test whether the item is already present in the list; if so, release the lock and 
return. 


3. Release the read lock. 

4. Obtain a write lock. 

5. Add the item. 
The problem is that another thread could sneak in and modify the list (e.g., adding 
the same item) between steps 3 and 4. ReaderWriterLockSlim addresses this 
through a third kind of lock called an upgradeable lock. An upgradeable lock is like a 


read lock except that it can later be promoted to a write lock in an atomic operation. 
Here's how you use it: 


1. Call EnterUpgradeableReadLock. 


2. Perform read-based activities (e.g., test whether the item is already present in 
the list). 


. Call EnterWriteLock (this converts the upgradeable lock to a write lock). 
. Perform write-based activities (e.g., add the item to the list). 
. Call ExitWriteLock (this converts the write lock back to an upgradeable lock). 


. Perform any other read-based activities. 


N DBD wo — W 


. Call ExitUpgradeableReadLock. 


From the caller’s perspective, it’s rather like nested or recursive locking. Function- 
ally, though, in step 3, ReaderWriterLockSlinm releases your read lock and obtains a 
fresh write lock, atomically. 


There’s another important difference between upgradeable locks and read locks. 
Although an upgradeable lock can coexist with any number of read locks, only one 
upgradeable lock can itself be taken out at a time. This prevents conversion dead- 
locks by serializing competing conversions—just as update locks do in SQL Server: 
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SQLServer  ReaderWriterLockSlim 


Share lock Read lock 
Exclusive lock Write lock 


Update lock — Upgradeable lock 





We can demonstrate an upgradeable lock by changing the Write method in the pre- 
ceding example such that it adds a number to the list only if it’s not already present: 


while (true) 
{ 
int newNumber = GetRandNum (100); 
_rw.EnterUpgradeableReadLock(); 
if (!_items.Contains (newNumber)) 
{ 
_rw.EnterWriteLock(); 
_items.Add (newNumber) ; 
_rw.ExitWriteLock(); 
Console.WriteLine ("Thread " + threadID + " added " + newNumber); 
} 
_rw.ExitUpgradeableReadLock(); 
Thread.Sleep (100); 


} 
ReaderWriterLock can also do lock conversions—but unrelia- 
bly because it doesn't support the concept of upgradeable 
locks. This is why the designers of ReaderWriterLockSlim had 
to start afresh with a new class. 
Lock recursion 


Ordinarily, nested or recursive locking is prohibited with ReaderWriterLockSLlinm. 
Hence, the following throws an exception: 


var rw = new ReaderWriterLockSlim(); 
rw. EnterReadLock(); 

rw. EnterReadLock(); 

rw. ExitReadLock(); 

rw. ExitReadLock(); 


It runs without error, however, if you construct ReaderWriterLockSlim as follows 
var rw = new ReaderWriterLockSlim (LockRecursionPolicy.SupportsRecursion) ; 


this ensures that recursive locking can happen only if you plan for it. Recursive 
locking can create undesired complexity because it’s possible to acquire more than 
one kind of lock: 


rw.EnterWriteLock(); 

rw. EnterReadLock(); 

Console.WriteLine (rw.IsReadLockHeld); // True 
Console.WriteLine (rw. IsWriteLockHeld) ; // True 
rw. ExitReadLock(); 

rw. ExitWriteLock(); 
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The basic rule is that after you've acquired a lock, subsequent recursive locks can be 
less, but not greater, on the following scale: 


e Read Lock—+Upgradeable Lock— Write Lock 


A request to promote an upgradeable lock to a write lock, however, is always legal. 


Signaling with Event Wait Handles 


The simplest kind of signaling constructs are called event wait handles (unrelated to 
C# events). Event wait handles come in three flavors: AutoResetEvent, ManualRese 
tEvent(Slim), and CountdownEvent. The former two are based on the common 
EventWaitHandLe class from which they derive all their functionality. 


AutoResetEvent 


An AutoResetEvent is like a ticket turnstile: inserting a ticket lets exactly one per- 
son through. The auto in the class's name refers to the fact that an open turnstile 
automatically closes or resets after someone steps through. A thread waits, or blocks, 
at the turnstile by calling WaitOne (wait at this one turnstile until it opens), and a 
ticket is inserted by calling the Set method. If a number of threads call WaitOne, a 
queue? builds up behind the turnstile. A ticket can come from any thread; in other 
words, any (unblocked) thread with access to the AutoResetEvent object can call 
Set on it to release one blocked thread. 


You can create an AutoResetEvent in two ways. The first is via its constructor: 
var auto = new AutoResetEvent (false); 


(Passing true into the constructor is equivalent to immediately calling Set upon it.) 
The second way to create an AutoResetEvent is as follows: 


var auto = new EventWaitHandle (false, EventResetMode.AutoReset); 


In the following example, a thread is started whose job is simply to wait until sig- 
naled by another thread (see Figure 22-1): 


class BasicWaitHandle 


{ 


static EventWaitHandle _waitHandle = new AutoResetEvent (false); 
static void Main() 


new Thread (Waiter).Start(); 
Thread.Sleep (1000); // Pause for a second... 
_waitHandle.Set(); // Wake up the Waiter. 


} 





2 As with locks, the fairness of the queue can sometimes be violated due to nuances in the operat- 
ing system. 
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static void Waiter() 
{ 
Console.WriteLine ("Waiting..."); 
_waitHandle.WaitOne(); // Wait for notification 
Console.WriteLine ("Notified"); 
} 
} 


// Output: 
Waiting... (pause) Notified. 





Main thread 






(119 @1010]0)) 





New 
thread 





"Waiting" "Notified" 


BLOCKED 











Figure 22-1. Signaling with an EventWaitHandle 


If Set is called when no thread is waiting, the handle stays open for as long as it 
takes until some thread calls WaitOne. This behavior helps avoid a race between a 
thread heading for the turnstile, and a thread inserting a ticket (“Oops, inserted the 
ticket a microsecond too soon; now you'll have to wait indefinitely!”). However, 
calling Set repeatedly on a turnstile at which no one is waiting doesn't allow an 
entire party through when they arrive: only the next single person is let through and 
the extra tickets are “wasted” 





Disposing Wait Handles 


After you've finished with a wait handle, you can call its Close method to release the 
OS resource. Alternatively, you can simply drop all references to the wait handle and 
allow the garbage collector to do the job for you sometime later (wait handles 
implement the disposal pattern whereby the finalizer calls Close). This is one of the 
few scenarios for which relying on this backup is (arguably) acceptable, because wait 
handles have a light OS burden. 


Wait handles are released automatically when a process exits. 











Calling Reset on an AutoResetEvent closes the turnstile (should it be open) 
without waiting or blocking. 


WaitOne accepts an optional timeout parameter, returning false if the wait ended 
because of a timeout rather than obtaining the signal. 
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Calling WaitOne with a timeout of 0 tests whether a wait han- 
dle is open, without blocking the caller. Keep in mind, though, 
that doing this resets the AutoResetEvent if it’s open. 


Two-way signaling 

Suppose that we want the main thread to signal a worker thread three times in a 
row. If the main thread simply calls Set on a wait handle several times in rapid suc- 
cession, the second or third signal can become lost because the worker might take 
time to process each signal. 


The solution is for the main thread to wait until the worker’s ready before signaling 
it. We can do this by using another AutoResetEvent, as follows: 


class TwoWaySignaling 


{ 


static EventWaitHandle _ready = new AutoResetEvent (false); 
static EventWaitHandle _go = new AutoResetEvent (false); 
static readonly object _locker = new object(); 

static string _message; 


static void Main() 


{ 
new Thread (Work).Start(); 
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_ready.WaitOne(); // First wait until worker is ready 
lock (_locker) _message = "ooo"; 

_go.Set(); // Tell worker to go 
_ready.WaitOne(); 

lock (_locker) _message = "ahhh"; // Give the worker another message 
_go.Set(); 

_ready.WaitOne(); 

lock (_locker) _message = null; // Signal the worker to exit 
_go.Set(); 


ti 


static void Work() 


while (true) 


{ 
_ready.Set(); // Indicate that we're ready 
_go.WaitOne(); // Wait to be kicked off... 
lock (_locker) 
{ 
if (_message == null) return; // Gracefully exit 
Console.WriteLine (_message) ; 
} 
} 
} 


} 
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// Output: 
000 
ahhh 


Figure 22-2 shows this process visually. 





Main thread 






ready.WaitOne 
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Figure 22-2. Two-way signaling 


Here, we're using a null message to indicate that the worker should end. With 
threads that run indefinitely, it’s important to have an exit strategy! 


ManualResetEvent 


As we described in Chapter 14, a ManualResetEvent functions like a simple gate. 
Calling Set opens the gate, allowing any number of threads calling WaitOne to be let 
through. Calling Reset closes the gate. Threads that call WaitOne on a closed gate 
will block; when the gate is next opened, they will be released all at once. Apart from 
these differences, a ManualResetEvent functions like an AutoResetEvent. 


As with AutoResetEvent, you can construct a ManualResetEvent in two ways: 


var manuali = new ManualResetEvent (false); 
var manual2 = new EventWaitHandle (false, EventResetMode.ManualReset) ; 


There's another version of ManualResetEvent called Manual 
ResetEventSlim. The latter is optimized for short waiting 
times—with the ability to opt into spinning for a set number 
of iterations. It also has a more efficient managed implementa- 
tion and allows a Wait to be canceled via a CancellationTo 
ken. ManualResetEventSlim doesnt subclass WaitHandle; 
however, it exposes a WaitHandle property that returns a 
WaitHandle-based object when called (with the performance 
profile of a traditional wait handle). 
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Signaling Constructs and Performance 


Waiting or signaling an AutoResetEvent or ManualResetEvent takes about one 
microsecond (assuming no blocking). 


ManualResetEventSlim and CountdownEvent can be up to 50 times faster in short- 
wait scenarios because of their nonreliance on the OS and judicious use of spinning 
constructs. 


In most scenarios, however, the overhead of the signaling classes themselves doesn't 
create a bottleneck; thus, it is rarely a consideration. 











A ManualResetEvent is useful in allowing one thread to unblock many other 
threads. The reverse scenario is covered by CountdownEvent. 


CountdownEvent 


CountdownEvent lets you wait on more than one thread. The class has an efficient, 
fully managed implementation. To use the class, instantiate it with the number of 
threads or “counts” that you want to wait on: 
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var countdown = new CountdownEvent (3); // Initialize with "count" of 3. 


Calling Signal decrements the “count”; calling Wait blocks until the count goes 
down to zero: 


static CountdownEvent _countdown = new CountdownEvent (3); 


static void Main() 


{ 
new Thread (SaySomething).Start ("I am thread 1"); 


new Thread (SaySomething).Start ("I am thread 2"); 

new Thread (SaySomething).Start ("I am thread 3"); 
_countdown.Wait(); // Blocks until Signal has been called 3 times 
Console.WriteLine ("ALL threads have finished speaking!"); 


} 


static void SaySomething (object thing) 


{ 
Thread.Sleep (1000); 


Console.WriteLine (thing); 
_countdown.Signal(); 


} 


You can sometimes more easily solve problems for which 
CountdownEvent is effective by using the structured parallelism 
constructs that we describe in Chapter 23 (PLINQ and the 
Parallel class). 
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You can reincrement a CountdownEvent’s count by calling AddCount. However, if it 
has already reached zero, this throws an exception: you can't “unsignal” a Countdown 
Event by calling AddCount. To avoid the possibility of an exception being thrown, 
you can instead call TryAddCount, which returns false if the countdown is zero. 


To unsignal a countdown event, call Reset: this both unsignals the construct and 
resets its count to the original value. 


Like ManualResetEventSlim, CountdownEvent exposes a WaitHandle property for 
scenarios in which some other class or method expects an object based on 
WaitHandle. 


Creating a Cross-Process EventWaitHandle 


EventWaitHandle’s constructor allows a “named” EventWaitHandle to be created, 
capable of operating across multiple processes. The name is simply a string, and it 
can be any value that doesn't unintentionally conflict with someone else’s! If the 
name is already in use on the computer, you get a reference to the same underlying 
EventWaitHandle; otherwise, the OS creates a new one. Here’s an example: 


EventWaitHandle wh = new EventWaitHandle (false, EventResetMode.AutoReset, 
@"Global\MyCompany .MyApp.SomeName" ) ; 


If two applications each ran this code, they would be able to signal each other: the 
wait handle would work across all threads in both processes. 


Named event wait handles are available only on Windows. 


Wait Handles and Continuations 


Rather than waiting on a wait handle (and blocking your thread), you can attach a 
continuation to it by calling ThreadPool.RegisterWaitForSingleObject. This 
method accepts a delegate that is executed when a wait handle is signaled: 


static ManualResetEvent _starter = new ManualResetEvent (false); 


public static void Main() 
{ 
RegisteredWaitHandle reg = ThreadPool.RegisterWaitForSingleObject 
(_starter, Go, "Some Data", -1, true); 
Thread.Sleep (5000); 
Console.WriteLine ("Signaling worker..."); 
_starter.Set(); 
Console.ReadLine(); 


reg.Unregister (_starter); // Clean up when we're done. 
} 
public static void Go (object data, bool timedOut) 
{ 
Console.WriteLine ("Started - " + data); 
// Perform task... 
} 
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// Output: 

(5 second delay) 

Signaling worker... 

Started - Some Data 
When the wait handle is signaled (or a timeout elapses), the delegate runs on a 
pooled thread. You are then supposed to call Unregister to release the unmanaged 
handle to the callback. 


In addition to the wait handle and delegate, RegisterWaitForSingleObject accepts 
a black box object that it passes to your delegate method (rather like 
ParameterizedThreadStart) as well as a timeout in milliseconds (-1 meaning no 
timeout) and a Boolean flag indicating whether the request is one-off rather than 
recurring. 


You can reliably call RegisterWaitForSingleObject only 
once per wait handle. Calling this method again on the same 
wait handle causes an intermittent failure, whereby an 
unsignaled wait handle fires a callback as though it were 
signaled. 

This limitation makes (the nonslim) wait handles poorly 
suited to asynchronous programming. 


WaitAny, WaitAll, and SignalAndWait 


In addition to the Set, WaitOne, and Reset methods, there are static methods on the 
WaitHandle class to crack more complex synchronization nuts. The WaitAny, 
WaitALl, and SignalAndWait methods perform signaling and waiting operations on 
multiple handles. The wait handles can be of differing types (including Mutex and 
Semphore given that these also derive from the abstract WaitHandle class). Manual 
ResetEventSlim and CountdownEvent can also partake in these methods via their 
WaitHandle properties. 


WaitALL and SignalAndWait have a weird connection to the 
legacy COM architecture: these methods require that the 
caller be in a multithreaded apartment, the model least suit- 
able for interoperability. The main thread of a WPF or Win- 
dows Forms application, for example, is unable to interact 
with the clipboard in this mode. We discuss alternatives 
shortly. 


WaitHandle.WaitAny waits for any one of an array of wait handles; 
WaitHandle.WaitALl waits on all of the given handles, atomically. This means that 
if you wait on two AutoResetEvents: 

e WaitAny will never end up latching both events. 


e WaitALl will never end up latching only one event. 
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SignalAndWait calls Set on one WaitHandle and then calls WaitOne on another 
WaitHandle. After signaling the first handle, it will jump to the head of the queue in 
waiting on the second handle; this helps it succeed (although the operation is not 
truly atomic). You can think of this method as swapping one signal for another, and 
use it on a pair of EventWaitHandles to set up two threads to rendezvous or meet at 
the same point in time. Either AutoResetEvent or ManualResetEvent will do the 
trick. The first thread executes the following: 


WaitHandle.SignalAndWait (wh1, wh2); 
The second thread does the opposite: 


WaitHandle.SignalAndWait (wh2, wh1); 


Alternatives to WaitAll and SignalAndWait 


WaitALl and SignalAndWait wont run in a single-threaded apartment. Fortunately, 
there are alternatives. In the case of SignalAndWait, it’s rare that you need its queue- 
jumping semantics: in our rendezvous example, for instance, it would be valid sim- 
ply to call Set on the first wait handle, and then WaitOne on the other, if wait 
handles were used solely for that rendezvous. In the following section, we explore 
yet another option for implementing a thread rendezvous. 


In the case of WaitAny and WaitAll, if you don’t need atomicity, you can use the 
code we wrote in the previous section to convert the wait handles to tasks and then 
use Task.WhenAny and Task.WhenALl (Chapter 14). 


If you need atomicity, you can take the lowest-level approach to signaling and write 
the logic yourself with Monitor’s Wait and Pulse methods. We describe Wait and 
Pulse in detail online. 


The Barrier Class 


The Barrier class implements a thread execution barrier, allowing many threads to 
rendezvous at a point in time (not to be confused with Thread.MemoryBarrier). 
The class is very fast and efficient, and is built upon Wait, Pulse, and spinlocks. 


Here's how to use this class: 
1. Instantiate it, specifying how many threads should partake in the rendezvous 
(you can change this later by calling AddParticipants/RemoveParticipants). 
2. Have each thread call SignalAndWait when it wants to rendezvous. 
Instantiating Barrier with a value of 3 causes SignalAndWait to block until that 
method has been called three times. It then starts over: calling SignalAndWait again 


blocks until called another three times. This keeps each thread “in step” with every 
other thread. 
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In the following example, each of three threads writes the numbers 0 through 4 
while keeping in step with the other threads: 


static Barrier _barrier = new Barrier (3); 


static void Main() 


{ 
new Thread (Speak).Start(); 


new Thread (Speak).Start(); 
new Thread (Speak).Start(); 


} 


static void Speak() 
{ 


for (int i = 0; i < 5; i++) 
{ 


Console.Write (i + " "); 
_barrier.SignalAndWait(); 


i 
} 


OUTPUT: 000111222333444 


A really useful feature of Barrier is that you can also specify a post-phase action 
when constructing it. This is a delegate that runs after SignalAndWait has been 
called n times, but before the threads are unblocked (as shown in the shaded area in 
Figure 22-3). In our example, if we instantiate our barrier as follows: 


static Barrier _barrier = new Barrier (3, barrier => Console.WriteLine()); 


the output is this: 


BWNFH © 
BRWN FE © 
BWNR 


A post-phase action can be useful for coalescing data from each of the worker 
threads. It doesn’t need to worry about preemption, because all workers are blocked 
while it does its thing. 
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Figure 22-3. Barrier 


Lazy Initialization 


A frequent problem in threading is how to lazily initialize a shared field in a thread- 
safe fashion. The need arises when you have a field of a type that’s expensive to 
construct: 


class Foo 


{ 


public readonly Expensive Expensive = new Expensive(); 


} 


class Expensive { /* Suppose this is expensive to construct */ } 


The problem with this code is that instantiating Foo incurs the performance cost of 
instantiating Expensive—regardless of whether the Expensive field is ever accessed. 
The obvious answer is to construct the instance on demand: 


class Foo 
{ 
Expensive _expensive; 
public Expensive Expensive // Lazily instantiate Expensive 


{ 
get 
{ 
if (_expensive == null) _expensive = new Expensive(); 
return _expensive; 
} 
1 


Ry 


The question then arises, is this thread-safe? Aside from the fact that we're accessing 
_expensive outside a lock without a memory barrier, consider what would happen 
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if two threads accessed this property at once. They could both satisfy the if state- 
ment’s predicate and each thread end up with a different instance of Expensive. 
Because this can lead to subtle errors, we would say, in general, that this code is not 
thread-safe. 


The solution to the problem is to lock around checking and initializing the object: 


Expensive _expensive; 
readonly object _expenseLock = new object(); 


public Expensive Expensive 


{ 
get 


{ 


lock (_expenseLock) 


{ 


if (_expensive == null) _expensive = new Expensive(); 
return _expensive; 


} 
} 
} 


Lazy<T> 


The Lazy<T> class is available to help with lazy initialization. If instantiated with an 
argument of true, it implements the thread-safe initialization pattern just described. 


Lazy<T> actually implements a micro-optimized version of 
this pattern, called double-checked locking. Double-checked 
locking performs an additional volatile read to avoid the cost 
of obtaining a lock if the object is already initialized. 


To use Lazy<T>, instantiate the class with a value factory delegate that tells it how to 
initialize a new value, and the argument true. Then, access its value via the Value 
property: 


Lazy<Expensive> _expensive = new Lazy<Expensive> 
(() => new Expensive(), true); 


public Expensive Expensive { get { return _expensive.Value; } } 


If you pass false into Lazy<T>’s constructor, it implements the thread-unsafe lazy 
initialization pattern that we described at the beginning of this section—this makes 
sense when you want to use Lazy<T> in a single-threaded context. 


Lazylnitializer 
LazyInitializer is a static class that works exactly like Lazy<T> except: 
e Its functionality is exposed through a static method that operates directly on a 


field in your own type. This avoids a level of indirection, improving perfor- 
mance in cases where you need extreme optimization. 
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e It offers another mode of initialization in which multiple threads can race to 
initialize. 


To use LazyInitializer, call EnsureInitialized before accessing the field, pass- 
ing a reference to the field and the factory delegate: 


Expensive _expensive; 
public Expensive Expensive 


{ 
get // Implement double-checked locking 


{ 


LazyInitializer.EnsureInitialized (ref _expensive, 
() => new Expensive()); 
return _expensive; 


i 
} 


You can also pass in another argument to request that competing threads race to ini- 
tialize. This sounds similar to our original thread-unsafe example except that the 
first thread to finish always wins—and so you end up with only one instance. The 
advantage of this technique is that its even faster (on multicores) than double- 
checked locking because it can be implemented entirely without locks using 
advanced techniques that we describe in “NonBlocking Synchronization” and “Lazy 
Initialization” online. This is an extreme (and rarely needed) optimization that 
comes at a cost: 


It's slower when more threads race to initialize than you have cores. 


It potentially wastes CPU resources performing redundant initialization. 


¢ The initialization logic must be thread-safe (in this case, it would be thread- 
unsafe if Expensive’s constructor wrote to static fields, for instance). 


If the initializer instantiates an object requiring disposal, the “wasted” object 
wont be disposed without additional logic. 


Thread-Local Storage 


Much of this chapter has focused on synchronization constructs and the issues aris- 
ing from having threads concurrently access the same data. Sometimes, however, 
you want to keep data isolated, ensuring that each thread has a separate copy. Local 
variables achieve exactly this, but they are useful only with transient data. 


The solution is thread-local storage. You might be hard-pressed to think of a require- 
ment: data you'd want to keep isolated to a thread tends to be transient by nature. Its 
main application is for storing “out-of-band” data—that which supports the execu- 
tion path’s infrastructure, such as messaging, transaction, and security tokens. Pass- 
ing such data around in method parameters can be clumsy and can alienate all but 
your own methods; storing such information in ordinary static fields means sharing 
it among all threads. 
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Thread-local storage can also be useful in optimizing parallel code. It allows each 
thread to exclusively access its own version of a thread-unsafe object without need- 
ing locks—and without needing to reconstruct that object between method calls. 


There are four ways to implement thread-local storage. We take a look at them in 
the following subsections. 


[ThreadStatic] 


The easiest approach to thread-local storage is to mark a static field with the Thread 
Static attribute: 


[ThreadStatic] static int _x; 


Each thread then sees a separate copy of _x. 


Unfortunately, [ThreadStatic] doesn’t work with instance fields (it simply does 
nothing); nor does it play well with field initializers—they execute only once on the 
thread that’s running when the static constructor executes. If you need to work with 
instance fields—or start with a nondefault value—ThreadLocal<T> provides a better 
option. 


ThreadLocal<T> 


ThreadLocal<T> provides thread-local storage for both static and instance fields, 
and allows you to specify default values. 


Here's how to create a ThreadLocal<int> with a default value of 3 for each thread: 
static ThreadLocal<int> _x = new ThreadLocal<int> (() => 3); 


You then use _x’s Value property to get or set its thread-local value. A bonus of 
using ThreadLocal is that values are lazily evaluated: the factory function evaluates 
on the first call (for each thread). 


ThreadLocal<T> and instance fields 


ThreadLocal<T> is also useful with instance fields and captured local variables. For 
example, consider the problem of generating random numbers in a multithreaded 
environment. The Random class is not thread-safe, so we have to either lock around 
using Random (limiting concurrency) or generate a separate Random object for each 
thread. ThreadLocal<T> makes the latter easy: 


var localRandom = new ThreadLocal<Random>(() => new Random()); 
Console.WriteLine (LocalRandom.Value.Next()); 


Our factory function for creating the Random object is a bit simplistic, though, in 
that Random’s parameterless constructor relies on the system clock for a random 
number seed. This may be the same for two Random objects created within ~10 ms of 
each other. Here’s one way to fix it: 
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var LocalRandom = new ThreadLocal<Random> 
( () => new Random (Guid.NewGuid().GetHashCode()) ); 


We use this in Chapter 23 (see the parallel spellchecking example in “PLINQ” on 
page 927). 


GetData and SetData 


The third approach is to use two methods in the Thread class: GetData and SetData. 
These store data in thread-specific “slots.” Thread.GetData reads from a thread's 
isolated data store; Thread.SetData writes to it. Both methods require a Local 
DataStoreSlot object to identify the slot. You can use the same slot across all 
threads and they'll still get separate values. Here’s an example: 


class Test 


{ 
// The same LocalDataStoreSlot object can be used across all threads. 
LocalDataStoreSlot _secSlot = Thread.GetNamedDataSlot ("securityLevel"); 


// This property has a separate value on each thread. 
int SecurityLevel 


{ 
get 


{ 
object data = Thread.GetData (_secSlot); 
return data == null ? 0: (int) data; // null == uninitialized 


set { Thread.SetData (_secSlot, value); } 
} 


In this instance, we called Thread.GetNamedDataSlot, which creates a named slot— 
this allows sharing of that slot across the application. Alternatively, you can control 
a slot’s scope yourself with an unnamed slot, obtained by calling Thread.Allocate 
DataSlot: 


class Test 


{ 
LocalDataStoreSlot _secSlot = Thread.AllocateDataSlot(); 


Thread.FreeNamedDataSlot will release a named data slot across all threads, but 
only once all references to that LocalDataStoreSlot have dropped out of scope and 
have been garbage-collected. This ensures that threads don't have data slots pulled 
out from under their feet, as long as they keep a reference to the appropriate 
LocalDataStoreSlot object while the slot is needed. 
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AsyncLocal<T> 


The approaches to thread-local storage that we've discussed so far are incompatible 
with asynchronous functions, because after an await, execution can resume on a 
different thread. The AsyncLocal<T> class solves this by preserving its value across 


an await: 


static AsyncLocal<string> _asyncLocalTest = new AsyncLocal<string>(); 


async void Main() 


{ 


_asyncLocalTest.Value = "test"; 

await Task.Delay (1000); 

// The following works even if we come back on another thread: 
Console.WriteLine (_asyncLocalTest.Value); // test 


i 
AsyncLocal<T> is still able to keep operations started on separate threads apart, 
whether initiated by Thread.Start or Task.Run. The following writes “one one” and 
“two two”: 


static AsyncLocal<string> _asyncLocalTest = new AsyncLocal<string>(); 


void Main() 


// Call Test twice on two concurrent threads: 
new Thread (() => Test ("one")).Start(); 
new Thread (() => Test ("two")).Start(); 


} 


async void Test (string value) 


{ 


_asyncLocalTest.Value = value; 
await Task.Delay (1000); 
Console.WriteLine (value + " " + _asyncLocalTest.Value) ; 


i 
AsyncLocal<T> has an interesting and unique nuance: if an AsyncLocal<T> object 
already has a value when a thread is started, the new thread will “inherit” that value: 


static AsyncLocal<string> _asyncLocalTest = new AsyncLocal<string>(); 


void Main() 


{ 


_asyncLocalTest.Value = "test"; 
new Thread (AnotherMethod).Start(); 


} 


void AnotherMethod() => Console.WriteLine (_asyncLocalTest.Value); // test 


The new thread, however, gets a copy of the value, so any changes that it makes will 
not affect the original: 


static AsyncLocal<string> _asyncLocalTest = new AsyncLocal<string>(); 
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void Main() 


{ 
_asyncLocalTest.Value = "test"; 
var t = new Thread (AnotherMethod); 
t.StartQ); t.Join(); 
Console.WriteLine (_asyncLocalTest.Value) ; // test (not ha-ha!) 


} 


void AnotherMethod() => _asyncLocalTest.Value = "ha-ha!"; 


Keep in mind that the new thread gets a shallow copy of the value. So, if you were to 
replace Async<string> with Async<StringBuilder> or Async<List<string>>, 
the new thread could clear the StringBuilder, or add/remove items to the 
List<string>, and this would affect the original. 


Timers 


If you need to execute some method repeatedly at regular intervals, the easiest way 
is with a timer. Timers are convenient and efficient in their use of memory and 
resources—compared with techniques such as the following: 


new Thread (delegate() { 
while (enabled) 


{ 


DoSomeAction(); 
Thread.Sleep (TimeSpan.FromHours (24)); 


} 
}).Start(); 


Not only does this permanently tie up a thread resource, but without additional 
coding, DoSomeAction will happen at a later time each day. Timers solve these 
problems. 


.NET Core provides four timers. Two of these are general-purpose multithreaded 
timers: 
e System. Threading. Timer 


e System. Timers.Timer 
The other two are special-purpose single-threaded timers: 


e System.Windows.Forms.Timer (Windows Forms timer) 
e System.Windows.Threading.DispatcherTimer (WPF timer) 
The multithreaded timers are more powerful, accurate, and flexible; the single- 


threaded timers are safer and more convenient for running simple tasks that update 
Windows Forms controls or WPF elements. 
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Multithreaded Timers 


System. Threading.Timer is the simplest multithreaded timer: it has just a con- 
structor and two methods (a delight for minimalists, as well as book authors!). In 
the following example, a timer calls the Tick method, which writes “tick..” after five 
seconds have elapsed, and then every second after that, until the user presses Enter: 


using System; 
using System. Threading; 


class Program 





{ 
static void Main() 
{ 
// First interval = 5000ms; subsequent intervals = 1000ms 
Timer tmr = new Timer (Tick, "tick...", 5000, 1000); 
Console.ReadLine(); 
tmr.Dispose(); // This both stops the timer and cleans up. 
} 
a> 
static void Tick (object data) = S 
{ a 9 
// This runs on a pooled thread as 
Console.WriteLine (data); // Writes "tick..." @a 
} 
} 


See “Timers” on page 543 in Chapter 12 for a discussion on 
disposing multithreaded timers. 


You can change a timer’s interval later by calling its Change method. If you want a 
timer to fire just once, specify Timeout . Infinite in the constructor’s last argument. 


.NET Core provides another timer class of the same name in the System.Timers 
namespace. This simply wraps the System. Threading. Timer, providing additional 
convenience while using the identical underlying engine. Here’s a summary of its 
added features: 


An IComponent implementation, allowing it to be sited in the Visual Studio's 
Designer’s component tray 


An Interval property instead of a Change method 

An Elapsed event instead of a callback delegate 

An Enabled property to start and stop the timer (its default value being false) 
Start and Stop methods in case you're confused by Enabled 


An AutoReset flag for indicating a recurring event (default value is true) 
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e A SynchronizingObject property with Invoke and BeginInvoke methods for 
safely calling methods on WPF elements and Windows Forms controls 


Here’s an example: 


using System; 
using System.Timers; // Timers namespace rather than Threading 


class SystemTimer 


{ 

static void Main() 

{ 
Timer tmr = new Timer(); // Doesn't require any args 
tmr.Interval = 500; 
tmr.Elapsed += tmr_Elapsed; // Uses an event instead of a delegate 
tmr.Start(); // Start the timer 
Console.ReadLine(); 
tmr.Stop(); // Stop the timer 
Console.ReadLine(); 
tmr.Start(); // Restart the timer 
Console.ReadLine(); 
tmr.Dispose(); // Permanently stop the timer 

} 

static void tmr_Elapsed (object sender, EventArgs e) 

{ 
Console.WriteLine ("Tick"); 

} 


} 


Multithreaded timers use the thread pool to allow a few threads to serve many 
timers. This means that the callback method or Elapsed event can fire on a different 
thread each time it is called. Furthermore, the Elapsed event always fires 
(approximately) on time—regardless of whether the previous Elapsed event fin- 
ished executing. Hence, callbacks or event handlers must be thread-safe. 


The precision of multithreaded timers depends on the OS and is typically in the 10- 
to 20-millisecond region. If you need greater precision, you can use native interop 
and call the Windows multimedia timer. This has precision down to one millisec- 
ond and it is defined in winmm.dll. First call timeBeginPeriod to inform the OS 
that you need high timing precision, and then call timeSetEvent to start a multime- 
dia timer. When youre done, call timeKillEvent to stop the timer and timeEnd 
Period to inform the OS that you no longer need high timing precision. Chapter 25 
demonstrates calling external methods with P/Invoke. You can find complete exam- 
ples on the internet that use the multimedia timer by searching for the keywords 
dllimport winmm.dll timesetevent. 


Single-Threaded Timers 


.NET Core provides timers designed to eliminate thread-safety issues for WPF and 
Windows Forms applications: 
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e System.Windows.Threading.DispatcherTimer (WPF) 


e System.Windows.Forms.Timer (Windows Forms) 


The single-threaded timers are not designed to work outside 
their respective environments. If you use a Windows Forms 
timer in a Windows Service application, for instance, the 
Timer event wont fire! 


Both are like System.Timers.Timer in the members that they expose—Interval, 
Start, and Stop (and Tick, which is equivalent to ELapsed)—and are used in a sim- 
ilar manner. However, they differ in how they work internally. Instead of firing 
timer events on pooled threads, they post the events to the WPF or Windows Forms 
message loop. This means that the Tick event always fires on the same thread that 
originally created the timer—which, in a normal application, is the same thread 
used to manage all user interface elements and controls. This has a number of 
benefits: 


e You can forget about thread safety. 
e A fresh Tick will never fire until the previous Tick has finished processing. 


e You can update user interface elements and controls directly from Tick event 
handling code without calling Control.BeginInvoke or Dispatcher .Begin 
Invoke. 


Thus, a program employing these timers is not really multithreaded: you end up 
with the same kind of pseudoconcurrency that’s described in Chapter 14 with asyn- 
chronous functions that execute on a UI thread. One thread serves all timers as well 
as the processing UI events. Which means that the Tick event handler must execute 
quickly, otherwise the UI becomes unresponsive. 


This makes the WPF and Windows Forms timers suitable for small jobs, typically 
updating some aspect of the UI (e.g., a clock or countdown display). 


In terms of precision, the single-threaded timers are similar to the multithreaded 
timers (tens of milliseconds), although they are typically less accurate because they 
can be delayed while other UI requests (or other timer events) are processed. 
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Parallel Programming 








In this chapter, we cover the multithreading APIs and constructs aimed at leverag- 
ing multicore processors: 


¢ Parallel LINQ or PLINQ 
e The Parallel class 
e The task parallelism constructs 


e The concurrent collections 


These constructs are collectively known (loosely) as Parallel Framework (PFX). The 
Parallel class together with the task parallelism constructs is called the Task Paral- 
lel Library (TPL). 


You'll need to be comfortable with the fundamentals in Chapter 14 before reading 
this chapter—particularly locking, thread safety, and the Task class. 


.NET Core offers a number of additional specialized APIs to 
help with parallel and asynchronous programming: 


« System. Threading.Channels.Channel is a high-perfor- 
mance asynchronous producer/consumer queue, new 
to .NET Core 3. 


e Microsoft Dataflow (in the System.Threading 
.Tasks.Dataflow namespace) is a sophisticated API for 
creating networks of buffered blocks that execute actions 
or data transformations in parallel, with a semblance to 
actor/agent programming. 


¢ Reactive Extensions implements LINQ over I0bservable 
(an alternative abstraction to IAsyncEnumerable) and 
excels at combining asynchronous streams. Reactive 
extensions ships in the System. Reactive NuGet package. 
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Why PFX? 


Over the past 15 years, CPU manufacturers have shifted from single-core to multi- 
core processors. This is problematic for us as programmers because single-threaded 
code does not automatically run faster as a result of those extra cores. 


Utilizing multiple cores is easy for most server applications, where each thread can 
independently handle a separate client request, but it’s more difficult on the desktop 
because it typically requires that you take your computationally intensive code and 
do the following: 


1. Partition it into small chunks. 
2. Execute those chunks in parallel via multithreading. 


3. Collate the results as they become available, in a thread-safe and performant 
manner. 


Although you can do all of this with the classic multithreading constructs, it’s awk- 
ward—particularly the steps of partitioning and collating. A further problem is that 
the usual strategy of locking for thread safety causes a lot of contention when many 
threads work on the same data at once. 


The PFX libraries have been designed specifically to help in these scenarios. 


Programming to leverage multicores or multiple processors is 
called parallel programming. This is a subset of the broader 
concept of multithreading. 


PFX Concepts 


There are two strategies for partitioning work among threads: data parallelism and 
task parallelism. 


When a set of tasks must be performed on many data values, we can parallelize by 
having each thread perform the (same) set of tasks on a subset of values. This is 
called data parallelism because we are partitioning the data between threads. In con- 
trast, with task parallelism we partition the tasks; in other words, we have each 
thread perform a different task. 


In general, data parallelism is easier and scales better to highly parallel hardware 
because it reduces or eliminates shared data (thereby reducing contention and 
thread-safety issues). Also, data parallelism exploits the fact that there are often 
more data values than discrete tasks, increasing the parallelism potential. 


Data parallelism is also conducive to structured parallelism, which means that paral- 
lel work units start and finish in the same place in your program. In contrast, task 
parallelism tends to be unstructured, meaning that parallel work units may start and 
finish in places scattered across your program. Structured parallelism is simpler and 
less error prone and allows you to farm the difficult job of partitioning and thread 
coordination (and even result collation) out to libraries. 
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PFX Components 


PFX comprises two layers of functionality, as shown in Figure 23-1. The higher layer 
consists of two structured data parallelism APIs: PLINQ and the Parallel class. The 
lower layer contains the task parallelism classes—plus a set of additional constructs 
to help with parallel programming activities. 
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Structured data parallelism 





Task Parallel Library (TPL) 


Spinning 
primitives 





CLR thread pool 


Figure 23-1. PFX components 
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PLINQ offers the richest functionality: it automates all the steps of parallelization— 
including partitioning the work into tasks, executing those tasks on threads, and 
collating the results into a single output sequence. It’s called declarative—because 
you simply declare that you want to parallelize your work (which you structure as a 
LINQ query), and let the Framework take care of the implementation details. In 
contrast, the other approaches are imperative, in that you need to explicitly write 
code to partition or collate. As the following synopsis shows, in the case of the 
Parallel class, you must collate results yourself; with the task parallelism con- 
structs, you must partition the work yourself, too: 


Partitions work Collates results 


PLINQ Yes Yes 
The Parallel class Yes No 
PFX’s task parallelism No No 





The concurrent collections and spinning primitives help you with lower-level paral- 
lel programming activities. These are important because PFX has been designed to 
work not only with today’s hardware, but also with future generations of processors 
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with far more cores. If you want to move a pile of chopped wood and you have 32 
workers to do the job, the biggest challenge is moving the wood without the workers 
getting in each other’s way. It’s the same with dividing an algorithm among 32 cores: 
if ordinary locks are used to protect common resources, the resultant blocking can 
mean that only a fraction of those cores are ever actually busy at once. The concur- 
rent collections are tuned specifically for highly concurrent access, with the focus on 
minimizing or eliminating blocking. PLINQ and the Parallel class themselves rely 
on the concurrent collections and on spinning primitives for efficient management 
of work. 





Other Uses for PFX 


The parallel programming constructs are useful not only for leveraging multicores, 
but in other scenarios: 


e The concurrent collections are sometimes appropriate when you want a 
thread-safe queue, stack, or dictionary. 


* BlockingCollection provides an easy means to implement producer/consumer 
structures, and is a good way to limit concurrency. 


¢ Tasks are the basis of asynchronous programming, as we saw in Chapter 14. 











When to Use PFX 


The primary use case for PFX is parallel programming: leveraging multicore process- 
ors to speed up computationally intensive code. 


A challenge in parallel programming is Amdahl’s law, which states that the maxi- 
mum performance improvement from parallelization is governed by the portion of 
the code that must execute sequentially. For instance, if only two-thirds of an algo- 
rithm’s execution time is parallelizable, you can never exceed a threefold perfor- 
mance gain—even with an infinite number of cores. 


So, before proceeding, it’s worth verifying that the bottleneck is in parallelizable 
code. It’s also worth considering whether your code needs to be computationally 
intensive—optimization is often the easiest and most effective approach. There’s a 
trade-off, though, in that some optimization techniques can make it more difficult 
to parallelize code. 


The easiest gains come with what's called embarrassingly parallel problems—this is 
when a job can be easily divided into tasks that efficiently execute on their own 
(structured parallelism is very well suited to such problems). Examples include 
many image-processing tasks, ray tracing, and brute-force approaches in mathemat- 
ics or cryptography. An example of a non-embarrassingly parallel problem is imple- 
menting an optimized version of the quicksort algorithm—a good result takes some 
thought and might require unstructured parallelism. 
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PLINQ 


PLINQ automatically parallelizes local LINQ queries. PLINQ has the advantage of 
being easy to use in that it offloads the burden of both work partitioning and result 
collation to .NET Core. 


To use PLINQ, simply call AsParallel() on the input sequence and then continue 
the LINQ query as usual. The following query calculates the prime numbers 
between 3 and 100,000, making full use of all cores on the target machine: 


// Calculate prime numbers using a simple (unoptimized) algorithm. 
IEnumerable<int> numbers = Enumerable.Range (3, 100000-3); 


var parallelQuery = 
from n in numbers.AsParallel() 
where Enumerable.Range (2, (int) Math.Sqrt (n)).ALL (i => n% i > 0) 
select n; 


int[] primes = parallelQuery.ToArray(); 


AsParallel is an extension method in System. Ling. ParallelEnumer able. It wraps 
the input in a sequence based on ParallelQuery<TSource>, which causes the LINQ 
query operators that you subsequently call to bind to an alternate set of extension 
methods defined in ParallelEnumerable. These provide parallel implementations 
of each of the standard query operators. Essentially, they work by partitioning the 
input sequence into chunks that execute on different threads, collating the results 
back into a single output sequence for consumption, as depicted in Figure 23-2. 





ParallelEnumerable.Select 


[a[b} > [ALB] 


Thread 1 


CD] ALB E] FI CID| 
Thread 2 


[elf}>ELE] 


Thread 3 











"abcdef" .AsParallel().Select (c => char.ToUpper(c)).ToArray() 








Figure 23-2. PLINQ execution model 


Calling AsSequential() unwraps a ParallelQuery sequence so that subsequent 
query operators bind to the standard query operators and execute sequentially. This 
is necessary before calling methods that have side effects or are not thread-safe. 


For query operators that accept two input sequences (Join, GroupJoin, Concat, 
Union, Intersect, Except, and Zip), you must apply AsParallel() to both input 
sequences (otherwise, an exception is thrown). You don't, however, need to keep 
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applying AsParallel to a query as it progresses, because PLINQ’s query operators 
output another ParallelQuery sequence. In fact, calling AsParallel again introdu- 
ces inefficiency in that it forces merging and repartitioning of the query: 


mySequence.AsParallel() // Wraps sequence in ParallelQuery<int> 
-.Where (n => n > 100) // Outputs another ParallelQuery<int> 
-AsParallel() // Unnecessary - and inefficient! 


Select (n => n * n) 


Not all query operators can be effectively parallelized. For those that cannot (see 
“PLINQ Limitations” on page 930), PLINQ implements the operator sequentially, 
instead. PLINQ might also operate sequentially if it suspects that the overhead of 
parallelization will actually slow a particular query. 


PLINQ is only for local collections: it doesn’t work with Entity Framework, for 
instance, because in those cases the LINQ translates into SQL which then executes 
on a database server. However, you can use PLINQ to perform additional local 
querying on the result sets obtained from database queries. 


If a PLINQ query throws an exception, it’s rethrown as an 
AggregateException whose InnerExceptions property con- 
tains the real exception (or exceptions). For more details, see 
“Working with AggregateException’” on page 956. 





Why Isn’t AsParallel the Default? 


Given that AsParallel transparently parallelizes LINQ queries, the question arises: 
Why didn't Microsoft simply parallelize the standard query operators and make 
PLINQ the default? 


There are a number of reasons for the opt-in approach. First, for PLINQ to be useful 
there must be a reasonable amount of computationally intensive work for it to farm 
out to worker threads. Most LINQ-to-Objects queries execute very quickly; thus, 
not only would parallelization be unnecessary, but the overhead of partitioning, col- 
lating, and coordinating the extra threads might actually slow things down. 


Additionally: 
¢ The output of a PLINQ query (by default) can differ from a LINQ query with 
respect to element ordering (see “PLINQ and Ordering” on page 929). 


e PLINQ wraps exceptions in an AggregateException (to handle the possibility 
of multiple exceptions being thrown). 


« PLINQ will give unreliable results if the query invokes thread-unsafe methods. 


Finally, PLINQ offers quite a few hooks for tuning and tweaking. Burdening the 
standard LINQ-to-Objects API with such nuances would add distraction. 
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Parallel Execution Ballistics 


Like ordinary LINQ queries, PLINQ queries are lazily evaluated. This means that 
execution is triggered only when you begin consuming the results—typically via a 
foreach loop (although it can also be via a conversion operator such as ToArray or 
an operator that returns a single element or value). 


As you enumerate the results, though, execution proceeds somewhat differently 
from that of an ordinary sequential query. A sequential query is powered entirely by 
the consumer in a pull fashion: each element from the input sequence is fetched 
exactly when required by the consumer. A parallel query ordinarily uses independ- 
ent threads to fetch elements from the input sequence slightly ahead of when they’re 
needed by the consumer (rather like a teleprompter for newsreaders). It then pro- 
cesses the elements in parallel through the query chain, holding the results in a 
small buffer so that they’re ready for the consumer on demand. If the consumer 
pauses or breaks out of the enumeration early, the query processor also pauses or 
stops so as not to waste CPU time or memory. 


You can tweak PLINQ’s buffering behavior by calling with 
MergeOptions after AsParallel. The default value of 
AutoBuffered generally gives the best overall results. Not 
Buffered disables the buffer and is useful if you want to see 
results as soon as possible; FullyBuffered caches the entire 
result set before presenting it to the consumer (the OrderBy 
and Reverse operators naturally work this way, as do the ele- 
ment, aggregation, and conversion operators). 


PLINQ and Ordering 


A side effect of parallelizing the query operators is that when the results are collated, 
it’s not necessarily in the same order that they were submitted (see Figure 23-2). In 
other words, LINQ’s normal order-preservation guarantee for sequences no longer 
holds. If you need order preservation, you can force it by calling AsOrdered() after 
AsParallel(): 


myCoLlection.AsParallel().AsOrdered()... 


Calling AsOrdered incurs a performance hit with large numbers of elements because 
PLINQ must keep track of each element's original position. 


You can negate the effect of AsOrdered later in a query by calling AsUnordered: this 
introduces a “random shuffle point,’ which allows the query to execute more effi- 
ciently from that point on. So, if you wanted to preserve input-sequence ordering 
for just the first two query operators, youd do this: 


inputSequence.AsParallel().AsOrdered() 
. QueryOperator1() 
. QueryOperator2() 
-AsUnordered() // From here on, ordering doesn't matter 
. QueryOperator3() 
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AsOrdered is not the default because for most queries, the original input ordering 
doesn’t matter. In other words, if AsOrdered were the default, youd need to apply 
AsUnordered to the majority of your parallel queries to get the best performance, 
which would be burdensome. 


PLINQ Limitations 


There are practical limitations on what PLINQ can parallelize. The following query 
operators prevent parallelization by default unless the source elements are in their 
original indexing position: 


The indexed versions of Select, SelectMany, and ELementAt 


Most query operators change the indexing position of elements (including those 
that remove elements, such as Where). This means that if you want to use the pre- 
ceding operators, they'll usually need to be at the start of the query. 


The following query operators are parallelizable but use an expensive partitioning 
strategy that can sometimes be slower than sequential processing: 


Join, GroupBy, GroupJoin, Distinct, Union, Intersect, and Except 


The Aggregate operator’s seeded overloads in their standard incarnations are not 
parallelizable—PLINQ provides special overloads to deal with this (see “Optimizing 
PLINQ’ on page 934). 


All other operators are parallelizable, although use of these operators doesn't guar- 
antee that your query will be parallelized. PLINQ might run your query sequentially 
if it suspects that the overhead of parallelization will slow down that particular 
query. You can override this behavior and force parallelism by calling the following 
after AsParallel(): 


.WithExecutionMode (ParalleLExecutionMode.ForceParallelism) 


Example: Parallel Spellchecker 


Suppose that we want to write a spellchecker that runs quickly with very large docu- 
ments by utilizing all available cores. By formulating our algorithm into a LINQ 
query, we can very easily parallelize it. 


The first step is to download a dictionary of English words into a HashSet for effi- 
cient lookup: 


if (!File.Exists ("WordLookup.txt")) // Contains about 150,000 words 
new WebClient().DownloadFile ( 
"http: //www.albahari.com/ispell/allwords.txt", "WordLookup.txt"); 


var wordLookup = new HashSet<string> ( 
File.ReadAlLlLines ("WordLookup.txt"), 
StringComparer.InvariantCultureIgnoreCase) ; 
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We then use our word lookup to create a test document comprising an array of a 
million random words. After we build the array, let’s introduce a couple of spelling 
mistakes: 


var random = new Random(); 
string[] wordList = wordLookup.ToArray(); 


string[] wordsToTest = Enumerable.Range (0, 1000000) 
.Select (i => wordList [random.Next (0, wordList.Length) ]) 


-ToArray(); 
wordsToTest [12345] = "woozsh"; // Introduce a couple 
wordsToTest [23456] = "wubsie"; // of spelling mistakes. 


Now we can perform our parallel spellcheck by testing wordsToTest against 
wordLookup. PLINQ makes this very easy: 


var query = wordsToTest 
-AsParallel() 
Select ((word, index) => new IndexedWord { Word=word, Index=index }) 
.Where  (iword => !wordLookup.Contains (iword.Word)) 
.OrderBy (iword => iword. Index); 


foreach (var mistake in query) 
Console.WriteLine (mistake.Word + 


- index = " + mistake. Index); 
// OUTPUT: 

// woozsh - index = 12345 

// wubsie - index = 23456 


IndexedWord is a custom struct that we define as follows: 
struct IndexedWord { public string Word; public int Index; } 


The wordLookup.Contains method in the predicate gives the query some “meat” 
and makes it worth parallelizing. 


We could simplify the query slightly by using an anonymous 
type instead of the IndexedWord struct. However, this would 
degrade performance because anonymous types (being classes 
and therefore reference types) incur the cost of heap-based 
allocation and subsequent garbage collection. 


The difference might not be enough to matter with sequential 
queries, but with parallel queries, favoring stack-based alloca- 
tion can be quite advantageous. This is because stack-based 
allocation is highly parallelizable (as each thread has its own 
stack), whereas all threads must compete for the same heap— 
managed by a single memory manager and garbage collector. 


Using ThreadLocal<T> 


Let’s extend our example by parallelizing the creation of the random test-word list 
itself. We structured this as a LINQ query, so it should be easy. Here’s the sequential 
version: 
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string[] wordsToTest = Enumerable.Range (0, 1000000) 
.Select (i => wordList [random.Next (0, wordList.Length) ]) 
.ToArray(); 


Unfortunately, the call to random.Next is not thread-safe, so it’s not as simple as 
inserting AsParallel() into the query. A potential solution is to write a function 
that locks around random.Next; however, this would limit concurrency. The better 
option is to use ThreadLocal<Random> (see “Thread-Local Storage” on page 914 in 
Chapter 22) to create a separate Random object for each thread. We then can parallel- 
ize the query as follows: 


var LocalRandom = new ThreadLocal<Random> 
( © => new Random (Guid.NewGuid().GetHashCode()) ); 


string[] wordsToTest = Enumerable.Range (0, 1000000).AsParallel() 
.Select (i => wordList [localRandom.Value.Next (0, wordList.Length) ]) 
-ToArray(); 


In our factory function for instantiating a Random object, we pass in a Guid’s hash- 
code to ensure that if two Random objects are created within a short period of time, 
they'll yield different random number sequences. 





When to Use PLINQ 


It's tempting to search your existing applications for LINQ queries and experiment 
with parallelizing them. This is usually unproductive, because most problems for 
which LINQ is obviously the best solution tend to execute very quickly and so dont 
benefit from parallelization. A better approach is to find a CPU-intensive bottleneck 
and then consider whether it can be expressed as a LINQ query. (A welcome side 
effect of such restructuring is that LINQ typically makes code smaller and more 
readable.) 


PLINQ is well suited to embarrassingly parallel problems. It can be a poor choice 
for imaging, however, because collating millions of pixels into an output sequence 
creates a bottleneck. Instead, it’s better to write pixels directly to an array or unman- 
aged memory block and use the Parallel class or task parallelism to manage the 
multithreading. (It is possible, however, to defeat result collation using ForAll—we 
discuss this in “Optimizing PLINQ” on page 934. Doing so makes sense if the 
image-processing algorithm naturally lends itself to LINQ.) 











Functional Purity 


Because PLINQ runs your query on parallel threads, you must be careful not to per- 
form thread-unsafe operations. In particular, writing to variables is side-effecting 
and therefore thread-unsafe: 


// The following query multiplies each element by its position. 

// Given an input of Enumerable.Range(0,999), it should output squares. 
int i = 0; 

var query = from n in Enumerable.Range(0,999).AsParallel() select n * i++; 
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We could make incrementing i thread-safe by using locks, but the problem would 
still remain that i won't necessarily correspond to the position of the input element. 
And adding AsOrdered to the query wouldn't fix the latter problem, because 
AsOrdered ensures only that the elements are output in an order consistent with 
them having been processed sequentially—it doesn't actually process them 
sequentially. 


The correct solution is to rewrite our query to use the indexed version of Select: 
var query = Enumerable.Range(0,999).AsParallel().Select ((n, i) => n * i); 


For best performance, any methods called from query operators should be thread- 
safe by virtue of not writing to fields or properties (non-side-effecting, or function- 
ally pure). If they're thread-safe by virtue of locking, the query’s parallelism potential 
will be limited by the duration of the lock divided by the total time spent in that 
function. 


Setting the Degree of Parallelism 


By default, PLINQ chooses an optimum degree of parallelism for the processor in 
use. You can override it by calling WithDegreeOfParallelism after AsParallel: 


...AsParallel().WithDegreeOfPallelism(4)... 


An example of when you might increase the parallelism beyond the core count is 
with I/O-bound work (downloading many web pages at once, for instance). How- 
ever, task combinators and asynchronous functions provide a similarly easy and 
more efficient solution (see “Task Combinators” on page 629 in Chapter 14). Unlike 
with Tasks, PLINQ cannot perform I/O-bound work without blocking threads (and 
pooled threads, to make matters worse). 


Changing the degree of parallelism 


You can call WithDegreeOfParallelism only once within a PLINQ query. If you 
need to call it again, you must force merging and repartitioning of the query by call- 
ing AsParallel() again within the query: 
"The Quick Brown Fox" 
-AsParallel().WithDegreeOfParallelism (2) 
-.Where (c => !char.IsWhiteSpace (c)) 


-AsParallel().WithDegreeOfParallelism (3) // Forces Merge + Partition 
Select (c => char.ToUpper (c)) 


Cancellation 


Canceling a PLINQ query whose results you’re consuming in a foreach loop is 
easy: simply break out of the foreach and the query will be automatically canceled 
as the enumerator is implicitly disposed. 


For a query that terminates with a conversion, element, or aggregation operator, you 
can cancel it from another thread via a cancellation token (see “Cancellation” on 
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page 625 in Chapter 14). To insert a token, call WithCancellation after calling 
AsParallel, passing in the Token property of a CancellationTokenSource object. 
Another thread can then call Cancel on the token source, which throws an 
OperationCanceledException on the query’s consumer: 


TEnumerable<int> million = Enumerable.Range (3, 1000000); 

var cancelSource = new CancellationTokenSource(); 

var primeNumberQuery = 
from n in million.AsParallel().WithCancellation (cancelSource. Token) 
where Enumerable.Range (2, (int) Math.Sqrt (n)).ALL (i => n% i > 0) 


select n; 


new Thread (() => { 


Thread.Sleep (100); // Cancel query after 
cancelSource.Cancel(); // 100 milliseconds. 
} 
).Start(); 
try 
{ 


// Start query running: 
int[] primes = primeNumberQuery.ToArray(); 
// We'll never get here because the other thread will cancel us. 


} 


catch (OperationCanceledException) 


{ 


Console.WriteLine ("Query canceled"); 


} 


Upon cancellation, PLINQ waits for each worker thread to finish with its current 
element before ending the query. This means that any external methods that the 
query calls will run to completion. 


Optimizing PLINQ 


Output-side optimization 


One of PLINQ’s advantages is that it conveniently collates the results from parallel- 
ized work into a single output sequence. Sometimes, though, all that you end up 
doing with that sequence is running some function once over each element: 


foreach (int n in parallelQuery) 
DoSomething (n); 


If this is the case—and you don't care about the order in which the elements are 
processed—you can improve efficiency with PLINQ’s ForA11 method. 


The ForALl method runs a delegate over every output element of a ParallelQuery. 
It hooks directly into PLINQ’s internals, bypassing the steps of collating and enu- 
merating the results. Here’s a trivial example: 


"abcdef".AsParallel().Select (c => char.ToUpper(c)).ForAlLl (Console.Write); 





934 | Chapter 23: Parallel Programming 


Figure 23-3 shows the process. 


Collating and enumerating results is not a massively expensive 
operation, so the ForALl optimization yields the greatest gains 
when there are large numbers of quickly executing input 
elements. 





ParallelEnumerable.Select 


[a[b}-»[ATB}—> Console.Write 


Thread 1 


[ce] d}+[C]DHK—> Console.Write 


Thread 2 


[e[ f+ [EFL Console. Write 


Thread 3 





.AsParallel() 











"abcdef" .AsParallel().Select (c => char.ToUpper(c)).ForALl (Console.Write) 





Figure 23-3. PLINQ ForAll 


Input-side optimization 


PLINQ has three partitioning strategies for assigning input elements to threads: 


Strategy Element allocation Relative performance 


Chunk partitioning Dynamic Average 
Range partitioning Static Poor to excellent 
Hash partitioning —_ Static Poor 





For query operators that require comparing elements (GroupBy, Join, GroupJoin, 
Intersect, Except, Union, and Distinct), you have no choice: PLINQ always uses 
hash partitioning. Hash partitioning is relatively inefficient in that it must precalcu- 
late the hashcode of every element (so that elements with identical hashcodes can be 
processed on the same thread). If you find this to be too slow, your only option is to 
call AsSequential to disable parallelization. 


For all other query operators, you have a choice as to whether to use range or chunk 
partitioning. By default: 
¢ If the input sequence is indexable (if it’s an array or implements IList<T>), 
PLINQ chooses range partitioning. 
e Otherwise, PLINQ chooses chunk partitioning. 
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In a nutshell, range partitioning is faster with long sequences for which every ele- 
ment takes a similar amount of CPU time to process. Otherwise, chunk partitioning 
is usually faster. 


To force range partitioning: 


If the query starts with Enumerable.Range, replace that method with Parallel 
EnumerabLe. Range. 


¢ Otherwise, simply call ToList or ToArray on the input sequence (obviously, 
this incurs a performance cost in itself, which you should take into account). 


ParallelEnumerable.Range is not simply a shortcut for call- 
ing Enumerable.Range(...).AsParallel(). It changes the 
performance of the query by activating range partitioning. 


To force chunk partitioning, wrap the input sequence in a call to Partitioner 
.Create (in System.Collection.Concurrent), as follows: 


int[] numbers = { 3, 4, 5, 6, 7, 8, 9 }; 

var parallelQuery = 
Partitioner.Create (numbers, true).AsParallel() 
-Where (...) 


The second argument to Partitioner.Create indicates that you want to load- 
balance the query, which is another way of saying that you want chunk partitioning. 


Chunk partitioning works by having each worker thread periodically grab small 
chunks of elements from the input sequence to process (see Figure 23-4). PLINQ 
starts by allocating very small chunks (one or two elements at a time). It then 
increases the chunk size as the query progresses: this ensures that small sequences 
are effectively parallelized and large sequences don't cause excessive round-tripping. 
If a worker happens to get “easy” elements (that process quickly) it will end up get- 
ting more chunks. This system keeps every thread equally busy (and the cores bal- 
anced); the only downside is that fetching elements from the shared input sequence 
requires synchronization (typically an exclusive lock)—and this can result in some 
overhead and contention. 
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Figure 23-4. Chunk versus range partitioning 


Range partitioning bypasses the normal input-side enumeration and preallocates an 
equal number of elements to each worker, avoiding contention on the input 
sequence. But if some threads happen to get easy elements and finish early, they sit 
idle while the remaining threads continue working. Our earlier prime number cal- 
culator might perform poorly with range partitioning. An example of when range 
partitioning would do well is in calculating the sum of the square roots of the first 
10 million integers: 


ParallelEnumerable.Range (1, 10000000).Sum (i => Math.Sqrt (i)) 


ParallelEnumerable.Range returns a ParallelQuery<T>, so you dont need to sub- 
sequently call AsParallel. 


Range partitioning doesn’t necessarily allocate element ranges 
in contiguous blocks—it might instead choose a “striping” 
strategy. For instance, if there are two workers, one worker 
might process odd-numbered elements while the other pro- 
cesses even-numbered elements. The TakeWhile operator is 
almost certain to trigger a striping strategy to avoid unneces- 
sarily processing elements later in the sequence. 
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Optimizing custom aggregations 


PLINQ parallelizes the Sum, Average, Min, and Max operators efficiently without 
additional intervention. The Aggregate operator, though, presents special chal- 
lenges for PLINQ. As described in Chapter 9, Aggregate performs custom aggrega- 
tions. For example, the following sums a sequence of numbers, mimicking the Sum 
operator: 


int[] numbers = { 1, 2, 3 }; 

int sum = numbers.Aggregate (0, (total, n) => total +n); // 6 
We also saw in Chapter 9 that for unseeded aggregations, the supplied delegate must 
be associative and commutative. PLINQ will give incorrect results if this rule is vio- 
lated, because it draws multiple seeds from the input sequence in order to aggregate 
several partitions of the sequence simultaneously. 


Explicitly seeded aggregations might seem like a safe option with PLINQ, but 
unfortunately these ordinarily execute sequentially because of the reliance on a sin- 
gle seed. To mitigate this, PLINQ provides another overload of Aggregate that lets 
you specify multiple seeds—or rather, a seed factory function. For each thread, it 
executes this function to generate a separate seed, which becomes a thread-local 
accumulator into which it locally aggregates elements. 


You must also supply a function to indicate how to combine the local and main 
accumulators. Finally, this Aggregate overload (somewhat gratuitously) expects a 
delegate to perform any final transformation on the result (you can achieve this as 
easily by running some function on the result yourself afterward). So, here are the 
four delegates, in the order they are passed: 


seedFactory 
Returns a new local accumulator 


updateAccumuLatorFunc 
Aggregates an element into a local accumulator 


combineAccumuLatorFunc 
Combines a local accumulator with the main accumulator 


resultSelector 
Applies any final transformation on the end result 


In simple scenarios, you can specify a seed value instead of a 
seed factory. This tactic fails when the seed is a reference type 
that you want to mutate, because the same instance will then 


be shared by each thread. 


To give a very simple example, the following sums the values in a numbers array: 


numbers.AsParallel().Aggregate ( 
() = 90, // seedFactory 
(localTotal, n) => localTotal +n, // updateAccumulatorFunc 
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(mainTot, localTot) => mainTot + localTot, // combineAccumulatorFunc 
finalResult => finalResult) // resultSelector 


This example is contrived in that we could get the same answer just as efficiently 
using simpler approaches (such as an unseeded aggregate, or better, the Sum opera- 
tor). To give a more realistic example, suppose that we want to calculate the fre- 
quency of each letter in the English alphabet in a given string. A simple sequential 
solution might look like this: 


string text = "Let's suppose this is a really long string"; 
var LetterFrequencies = new int[26]; 
foreach (char c in text) 


iG 
int index = char.ToUpper (c) - 'A'; 
if (index >= © && index <= 26) letterFrequencies [index]++; 


is 
An example of when the input text might be very long is in 


gene sequencing. The “alphabet” would then consist of the let- 
ters a, c, g, and t. 


To parallelize this, we could replace the foreach statement with a call to 
Parallel.ForEach (which we cover in the following section), but this will leave us 
to deal with concurrency issues on the shared array. And locking around accessing 
that array would all but kill the potential for parallelization. 


Aggregate offers a tidy solution. The accumulator, in this case, is an array just like 
the letterFrequencies array in our preceding example. Here's a sequential version 
using Aggregate: 
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int[] result = 
text.Aggregate ( 


new int[26], // Create the "accumulator" 
(letterFrequencies, c) => // Aggregate a letter into the accumulator 
{ 


int index = char.ToUpper (c) - 'A'; 
if (index >= © && index <= 26) letterFrequencies [index]++; 
return LetterFrequencies; 


}); 


And now the parallel version, using PLINQ’s special overload: 


int[] result = 
text.AsParallel().Aggregate ( 


() => new int[26], // Create a new local accumulator 
(localFrequencies, c) => // Aggregate into the local accumulator 
{ 


int index = char.ToUpper (c) - 'A'; 
if (index >= © && index <= 26) localFrequencies [index]++; 
return localFrequencies; 


}, 


// Aggregate local->main accumulator 
(mainFreq, localFreq) => 
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mainFreq.Zip (localFreq, (f1, f2) => f1 + f2).ToArray(), 


finalResult => finalResult // Perform any final transformation 
); // on the end result. 


Notice that the local accumulation function mutates the localFrequencies array. 
This ability to perform this optimization is important—and is legitimate because 
localFrequencies is local to each thread. 


The Parallel Class 


PFX provides a basic form of structured parallelism via three static methods in the 
Parallel class: 


Parallel. Invoke 
Executes an array of delegates in parallel 


Parallel. For 
Performs the parallel equivalent of a C# for loop 


Parallel. ForEach 
Performs the parallel equivalent of a C# foreach loop 


All three methods block until all work is complete. As with PLINQ, after an unhan- 
dled exception, remaining workers are stopped after their current iteration and the 
exception (or exceptions) are thrown back to the caller—wrapped in an Aggregate 
Exception (see “Working with AggregateException” on page 956). 


Parallel.Invoke 


Parallel. Invoke executes an array of Action delegates in parallel and then waits 
for them to complete. The simplest version of the method is defined as follows: 


public static void Invoke (params Action[] actions); 


Just as with PLINQ, the Parallel.* methods are optimized for compute-bound and 
not I/O-bound work. However, downloading two web pages at once provides a sim- 
ple way to demonstrate Parallel. Invoke: 


Parallel. Invoke ( 
() => new WebClient().DownloadFile ("http://www.lingpad.net", "Lp.html"), 
() => new WebClient().DownloadFile ("http://microsoft.com", "ms.html")); 


On the surface, this seems like a convenient shortcut for creating and waiting on 
two thread-bound Task objects. But there's an important difference: 
Parallel. Invoke still works efficiently if you pass in an array of a million delegates. 
This is because it partitions large numbers of elements into batches that it assigns to 
a handful of underlying Tasks rather than creating a separate Task for each delegate. 


As with all of Parallel’s methods, you’re on your own when it comes to collating 
the results. This means that you need to keep thread safety in mind. The following, 
for instance, is thread-unsafe: 
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var data = new List<string>(); 

Parallel.Invoke ( 
() => data.Add (new WebClient().DownloadString ("http://www.foo.com")), 
() => data.Add (new WebClient().DownloadString ("http://www.far.com"))); 


Locking around adding to the list would resolve this, although locking would create 
a bottleneck if you had a much larger array of quickly executing delegates. A better 
solution is to use the thread-safe collections, which we cover in later sections— 
ConcurrentBag would be ideal in this case. 


Parallel. Invoke is also overloaded to accept a ParallelOptions object: 


public static void Invoke (ParallelOptions options, 
params Action[] actions); 


With ParallelOptions, you can insert a cancellation token, limit the maximum 
concurrency, and specify a custom task scheduler. A cancellation token is relevant 
when youre executing (roughly) more tasks than you have cores: upon cancellation, 
any unstarted delegates will be abandoned. Any already executing delegates will, 
however, continue to completion. See “Cancellation” on page 933 for an example of 
how to use cancellation tokens. 


Parallel.For and Parallel.ForEach 


Parallel.For and Parallel.ForEach perform the equivalent of a C# for and 
foreach loop but with each iteration executing in parallel instead of sequentially. 
Here are their (simplest) signatures: 


public static ParallelLoopResult For ( 
int fromInclusive, int toExclusive, Action<int> body) 


public static ParallelLoopResult ForEach<TSource> ( 
TEnumerable<TSource> source, Action<TSource> body) 


This sequential for loop: 


for (int i = 0; i < 100; i++) 
Foo (i); 


is parallelized like this: 

Parallel.For (0, 100, i => Foo (i)); 
or more simply: 

Parallel.For (0, 100, Foo); 
And this sequential foreach: 


foreach (char c in "Hello, world") 
Foo (¢}; 


is parallelized like this: 


Parallel.ForEach ("Hello, world", Foo); 
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To give a practical example, if we import the System.Security.Cryptography 
namespace, we can generate six public/private keypair strings in parallel, as follows: 


var keyPairs = new string[6]; 


Parallel.For (0, keyPairs.Length, 
i => keyPairs[i] = RSA.Create().ToXmlString (true)); 


As with Parallel.Invoke, we can feed Parallel.For and Parallel.ForEach a 
large number of work items and they'll be efficiently partitioned onto a few tasks. 


The latter query could also be done with PLINQ: 


string[] keyPairs = 
ParallelEnumerable.Range (0, 6) 
-Select (i => RSA.Create().ToXmlString (true) ) 
-ToArray(); 


Outer versus inner loops 


Parallel.For and Parallel.ForEach usually work best on outer rather than inner 
loops. This is because with the former, you're offering larger chunks of work to par- 
allelize, diluting the management overhead. Parallelizing both inner and outer loops 
is usually unnecessary. In the following example, wed typically need more than 100 
cores to benefit from the inner parallelization: 


Parallel.For (0, 100, i => 


{ 
Parallel.For (0, 50, j => Foo (i, j))3 // Sequential would be better 


})s // for the inner loop. 


Indexed Parallel.ForEach 


Sometimes, it’s useful to know the loop iteration index. With a sequential foreach, 
it’s easy: 
int i = 0; 
foreach (char c in "Hello, world") 
Console.WriteLine (c.ToString() + i++); 


Incrementing a shared variable, however, is not thread-safe in a parallel context. 
You must instead use the following version of ForEach: 


public static ParallelLoopResult ForEach<TSource> ( 
TEnumerable<TSource> source, Action<TSource,ParallelLoopState, long> body) 


We'll ignore ParallelLoopState (which we cover in the following section). For 
now, we're interested in Action’s third type parameter of type Long, which indicates 
the loop index: 


Parallel.ForEach ("Hello, world", (c, state, i) => 
{ 


Console.WriteLine (c.ToString() + i); 


}); 
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To put this into a practical context, let’s revisit the spellchecker that we wrote with 
PLINQ. The following code loads up a dictionary along with an array of a million 
words to test: 


if (!File.Exists ("WordLookup.txt")) // Contains about 150,000 words 
new WebClient().DownloadFile ( 
"http: //www.albahari.com/ispell/allwords.txt", "WordLookup.txt"); 


var wordLookup = new HashSet<string> ( 
File.ReadAlLlLines ("WordLookup.txt"), 
StringComparer.InvariantCultureIgnoreCase) ; 


var random = new Random(); 
string[] wordList = wordLookup.ToArray(); 


string[] wordsToTest = Enumerable.Range (0, 1000000) 
.Select (i => wordList [random.Next (0, wordList.Length) ]) 


.ToArray(); 
wordsToTest [12345] = "woozsh"; // Introduce a couple 
wordsToTest [23456] = "wubsie"; // of spelling mistakes. 


We can perform the spellcheck on our wordsToTest array using the indexed version 
of Parallel. ForEach, as follows: 


var misspellings = new ConcurrentBag<Tuple<int,string>>(); 


Parallel.ForEach (wordsToTest, (word, state, i) => 


{ 


if (!wordLookup.Contains (word) ) 
misspeLllings.Add (Tuple.Create ((int) i, word)); 
}) 


Notice that we had to collate the results into a thread-safe collection: having to do 
this is the disadvantage when compared to using PLINQ. The advantage over 
PLINQ is that we avoid the cost of applying an indexed Select query operator— 
which is less efficient than an indexed ForEach. 
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ParallelLoopState: breaking early out of loops 


Because the loop body in a parallel For or ForEach is a delegate, you can't exit the 
loop early with a break statement. Instead, you must call Break or Stop on a 
ParalleLLoopState object: 


public class ParallelLoopState 
{ 

public void Break(); 

public void Stop(); 


public bool IsExceptional { get; } 

public bool IsStopped { get; } 

public long? LowestBreakIteration { get; } 
public bool ShouldExitCurrentIteration { get; } 
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Obtaining a ParallelLoopState is easy: all versions of For and ForEach are over- 
loaded to accept loop bodies of type Action<TSource, ParallelLoopState>. So, to 
parallelize this: 


foreach (char c in "Hello, world") 


ake CESS" ") 
break; 
else 
Console.Write (c); 
do this: 
Parallel.ForEach ("Hello, world", (c, loopState) => 
{ 
Lf (es= *5") 
loopState.Break(); 
else 
Console.Write (c); 
}) 


// OUTPUT: Hlloe 


You can see from the output that loop bodies can complete in a random order. 
Aside from this difference, calling Break yields at least the same elements as execut- 
ing the loop sequentially: this example will always output at least the letters H, e, |, |, 
and o in some order. In contrast, calling Stop instead of Break forces all threads to 
finish immediately after their current iteration. In our example, calling Stop could 
give us a subset of the letters H, e, |, 1, and o if another thread were lagging behind. 
Calling Stop is useful when youve found something that you're looking for—or 
when something has gone wrong and you wont be looking at the results. 


The Parallel.For and Parallel.ForEach methods return a 
ParallelLoopResult object that exposes properties called 
IsCompleted and LowestBreakIteration. These tell you 
whether the loop ran to completion; if it didnt, it indicates at 
what cycle the loop was broken. 


If LowestBreakIteration returns null, it means that you 
called Stop (rather than Break) on the loop. 


If your loop body is long, you might want other threads to break partway through 
the method body in case of an early Break or Stop. You can do this by polling the 
ShouldExitCurrentIteration property at various places in your code; this prop- 
erty becomes true immediately after a Stop—or soon after a Break. 


ShouldExitCurrentIteration also becomes true after a can- 
cellation request—or if an exception is thrown in the loop. 


IsExceptional lets you know whether an exception has occurred on another 
thread. Any unhandled exception will cause the loop to stop after each thread’s cur- 
rent iteration: to avoid this, you must explicitly handle exceptions in your code. 
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Optimization with local values 


Parallel.For and Parallel.ForEach each offer a set of overloads that feature a 
generic type argument called TLocal. These overloads are designed to help you 
optimize the collation of data with iteration-intensive loops. The simplest is this: 


public static ParallelLoopResult For <TLocal> ( 
int fromInclusive, 
int toExclusive, 
Func <TLocal> localinit, 
Func <int, ParallelLoopState, TLocal, TLocal> body, 
Action <TLocal> localFinally); 


These methods are rarely needed in practice because their target scenarios are cov- 
ered mostly by PLINQ (which is fortunate because these overloads are somewhat 
intimidating!). 


Essentially, the problem is this: suppose that we want to sum the square roots of the 
numbers 1 through 10,000,000. Calculating 10 million square roots is easily paralle- 
lizable, but summing their values is troublesome because we must lock around 
updating the total: 


object locker = new object(); 
double total = 0; 
Parallel.For (1, 10000000, 
i => { lock (locker) total += Math.Sqrt (i); }); 


The gain from parallelization is more than offset by the cost of obtaining 10 million 
locks—plus the resultant blocking. 


The reality, though, is that we don't actually need 10 million locks. Imagine a team 
of volunteers picking up a large volume of litter. If all workers shared a single trash 
can, the travel and contention would make the process extremely inefficient. The 
obvious solution is for each worker to have a private or “local” trash can, which is 
occasionally emptied into the main bin. 


The TLocal versions of For and ForEach work in exactly this way. The volunteers 
are internal worker threads, and the local value represents a local trash can. For 
Parallel to do this job, you must feed it two additional delegates that indicate the 
following: 

¢ How to initialize a new local value 

¢ How to combine a local aggregation with the master value 
Additionally, instead of the body delegate returning void, it should return the new 


aggregate for the local value. Here's our example refactored: 


object locker = new object(); 
double grandTotal = 0; 


Parallel.For (1, 10000000, 


() => 0.0, // Initialize the local value. 
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(i, state, localTotal) => // Body delegate. Notice that it 
localTotal + Math.Sqrt (i), // returns the new local total. 


localTotal => // Add the local value 
{ lock (locker) grandTotal += localTotal; } // to the master value. 
)3 
We must still lock, but only around aggregating the local value to the grand total. 
This makes the process dramatically more efficient. 


As stated earlier, PLINQ is often a good fit in these scenarios. 
Our example could be parallelized with PLINQ simply like 
this: 

ParallelEnumerable.Range (1, 10000000) 

-Sum (i => Math.Sqrt (i)) 

(Notice that we used ParallelEnumerable to force range par- 
titioning: this improves performance in this case because all 
numbers will take equally long to process.) 


In more complex scenarios, you might use LINQ’s Aggregate 
operator instead of Sum. If you supplied a local seed factory, 
the situation would be somewhat analogous to providing a 
local value function with Parallel. For. 


Task Parallelism 


Task parallelism is the lowest-level approach to parallelization with PFX. The classes 
for working at this level are defined in the System. Threading.Tasks namespace 
and comprise the following: 


Class Purpose 


Task For managing a unit for work 
Task<TResult> For managing a unit for work with a return value 
TaskFactory For creating tasks 


TaskFactory<TResult> For creating tasks and continuations with the same return type 
TaskScheduler For managing the scheduling of tasks 


TaskCompletionSource For manually controlling a task’s workflow 





The Task Parallel Library lets you create hundreds (or even 
thousands) of tasks with minimal overhead. But if you want to 
create millions of tasks, you'll need to partition those tasks 
into larger work units to maintain efficiency. The Parallel 
class and PLINQ do this automatically. 


We covered the basics of tasks in Chapter 14; in this section, we look at advanced 
features of tasks that are aimed at parallel programming: 
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¢ Tuning a task’s scheduling 
¢ Establish a parent/child relationship when one task is started from another 
e Advanced use of continuations 


e TaskFactory 


Visual Studio provides a window for monitoring tasks 
(Debug—Window-—Parallel Tasks). This is equivalent to the 
Threads window, but for tasks. The Parallel Stacks window 
also has a special mode for tasks. 


Creating and Starting Tasks 


As described in Chapter 14, Task.Run creates and starts a Task or Task<TResult>. 
This method is actually a shortcut for calling Task.Factory.StartNew, which 
allows greater flexibility through additional overloads. 


Specifying a state object 


Task. Factory. StartNew lets you specify a state object that is passed to the target. 
The target method's signature must then comprise a single object-type parameter: 


static void Main() 

{ 
var task = Task.Factory.StartNew (Greet, "Hello"); 
task.Wait(); // Wait for task to complete. 


} 
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static void Greet (object state) { Console.Write (state); } // Hello 


This avoids the cost of the closure required for executing a lambda expression that 
calls Greet. This is a micro-optimization and is rarely necessary in practice, so we 
can put the state object to better use, which is to assign a meaningful name to the 
task. We can then use the AsyncState property to query its name: 


static void Main() 


{ 
var task = Task.Factory.StartNew (state => Greet ("Hello"), "Greeting"); 


Console.WriteLine (task.AsyncState); // Greeting 
task.Wait(); 
} 


static void Greet (string message) { Console.Write (message); } 


Visual Studio displays each task’s AsyncState in the Parallel 
Tasks window, so having a meaningful name here can ease 


debugging considerably. 
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TaskCreationOptions 


You can tune a task’s execution by specifying a TaskCreationOptions enum when 
calling StartNew (or instantiating a Task). TaskCreationOptions is a flags enum 
with the following (combinable) values: 


LongRunning, PreferFairness, AttachedToParent 


LongRunning suggests to the scheduler to dedicate a thread to the task, and as we 
described in Chapter 14, this is beneficial for I/O-bound tasks and for long-running 
tasks that might otherwise force short-running tasks to wait an unreasonable 
amount of time before being scheduled. 


PreferFairness instructs the scheduler to try to ensure that tasks are scheduled in 
the order in which they were started. It might ordinarily do otherwise because it 
internally optimizes the scheduling of tasks using local work-stealing queues—an 
optimization that allows the creation of child tasks without incurring the contention 
overhead that would otherwise arise with a single work queue. A child task is cre- 
ated by specifying AttachedToParent. 


Child tasks 


When one task starts another, you can optionally establish a parent-child 
relationship: 


Task parent = Task.Factory.StartNew (() => 
{ 


Console.WriteLine ("I am a parent"); 


Task.Factory.StartNew (() => // Detached task 
{ 
Console.WriteLine ("I am detached"); 
}) 
Task.Factory.StartNew (() => // Child task 
{ 


Console.WriteLine ("I ama child"); 
}, TaskCreationOptions.AttachedToParent) ; 
}) 


A child task is special in that when you wait for the parent task to complete, it waits 
for any children, as well. At which point any child exceptions bubble up: 


TaskCreationOptions atp = TaskCreationOptions.AttachedToParent; 
var parent = Task.Factory.StartNew (() => 


{ 
Task.Factory.StartNew (() => // Child 
{ 
Task.Factory.StartNew (() => { throw null; }, atp); // Grandchild 
}, atp); 
}) 


// The following call throws a NullReferenceException (wrapped 
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// in nested AggregateExceptions): 
parent.Wait(); 


This can be particularly useful when a child task is a continuation, as you'll see 
shortly. 


Waiting on Multiple Tasks 


We saw in Chapter 14 that you can wait on a single task either by calling its Wait 
method, or accessing its Result property (if it’s a Task<TResult>). You can also wait 
on multiple tasks at once—via the static methods Task.WaitALl (waits for all the 
specified tasks to finish) and Task.WaitAny (waits for just one task to finish). 


WaitALl is similar to waiting out each task in turn, but is more efficient in that it 
requires (at most) just one context switch. Also, if one or more of the tasks throw an 
unhandled exception, WaitALl still waits out every task. It then rethrows an 
AggregateException that accumulates the exceptions from each faulted task (this is 
where AggregateException is genuinely useful). It’s equivalent to doing this: 


// Assume t1, t2 and t3 are tasks: 

var exceptions = new List<Exception>(); 

try { t1.Wait(); } catch (AggregateException ex) { exceptions.Add (ex); } 
try { t2.Wait(); } catch (AggregateException ex) { exceptions.Add (ex); } 
try { t3.Wait(); } catch (AggregateException ex) { exceptions.Add (ex); } 
if (exceptions.Count > ©) throw new AggregateException (exceptions); 


Calling WaitAny is equivalent to waiting on a ManualResetEventS Lin that’s signaled 
by each task as it finishes. 


As well as a timeout, you can also pass in a cancellation token to the Wait methods: 
this lets you cancel the wait—not the task itself. 


Canceling Tasks 


You can optionally pass in a cancellation token when starting a task. Then, if cancel- 
lation occurs via that token, the task itself enters the Canceled state: 


var cts = new CancellationTokenSource(); 
CancellationToken token = cts.Token; 
cts.CancelAfter (500); 


Task task = Task.Factory.StartNew (() => 
{ 

Thread.Sleep (1000); 

token. ThrowIfCancellationRequested(); // Check for cancellation request 
}, token); 


try { task.Wait(); } 
catch (AggregateException ex) 


{ 


Console.WriteLine (ex.InnerException is TaskCanceledException); // True 
Console.WriteLine (task.IsCanceled); // True 
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Console.WriteLine (task.Status); // Canceled 
$ 


TaskCanceledException is a subclass of OperationCanceledException. If you want 
to explicitly throw an OperationCanceledException (rather than calling token 
.ThrowlfCancellationRequested), you must pass the cancellation token into 
OperationCanceledException’s constructor. If you fail to do this, the task won't end 
up with a TaskStatus.Canceled status and won't trigger OnlyOnCanceled 
continuations. 


If the task is canceled before it has started, it won't get scheduled—an Operation 
CanceledException will instead be thrown on the task immediately. 


Because cancellation tokens are recognized by other APIs, you can pass them into 
other constructs and cancellations will propagate seamlessly: 


var cancelSource = new CancellationTokenSource(); 
CancellationToken token = cancelSource. Token; 


Task task = Task.Factory.StartNew (() => 
{ 


// Pass our cancellation token into a PLINQ query: 
var query = someSequence.AsParallel().WithCancellation (token)... 
. enumerate query ... 


}); 


Calling Cancel on cancelSource in this example will cancel the PLINQ query, 
which will throw an OperationCanceledException on the task body, which will 
then cancel the task. 


The cancellation tokens that you can pass into methods such 
as Wait and CancelAndWait allow you to cancel the wait oper- 
ation and not the task itself. 


Continuations 
The ContinueWith method executes a delegate immediately after a task ends: 


Task task1 = Task.Factory.StartNew (() => Console.Write ("antecedant..")); 
Task task2 = taski.ContinueWith (ant => Console.Write ("..continuation")); 


As soon as task1 (the antecedent) completes, fails, or is canceled, task2 (the contin- 
uation) starts. (If task1 had completed before the second line of code ran, task2 
would be scheduled to execute immediately.) The ant argument passed to the con- 
tinuation’s lambda expression is a reference to the antecedent task. ContinueWith 
itself returns a task, making it easy to add further continuations. 


By default, antecedent and continuation tasks may execute on different threads. You 
can force them to execute on the same thread by specifying TaskContinuation 
Options. ExecuteSynchronously when calling ContinueWith: this can improve per- 
formance in very fine-grained continuations by lessening indirection. 





950 | Chapter 23: Parallel Programming 


Continuations and Task<TResult> 


Just like ordinary tasks, continuations can be of type Task<TResult> and return 
data. In the following example, we calculate Math.Sqrt(8*2) using a series of 
chained tasks and then write out the result: 


Task.Factory.StartNew<int> (() => 8) 
.ContinueWith (ant => ant.Result * 2) 
.ContinueWith (ant => Math.Sqrt (ant.Result)) 
.ContinueWith (ant => Console.WriteLine (ant.Result)); /1 4 


Our example is somewhat contrived for simplicity; in real life, these lambda expres- 
sions would call computationally intensive functions. 


Continuations and exceptions 


A continuation can know whether an antecedent faulted by querying the antecedent 
task’s Exception property—or simply by invoking Result / Wait and catching the 
resultant AggregateException. If an antecedent faults and the continuation does 
neither, the exception is considered unobserved and the static TaskScheduler 
.UnobservedTaskException event fires when the task is later garbage-collected. 


A safe pattern is to rethrow antecedent exceptions. As long as the continuation is 
Waited upon, the exception will be propagated and rethrown to the Waiter: 


Task continuation = Task.Factory.StartNew (() => { throw null; }) 
.ContinueWith (ant => 
{ 
ant.Wait(); 
// Continue processing... 


y); 
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continuation.Wait(); // Exception is now thrown back to caller. 


Another way to deal with exceptions is to specify different continuations for excep- 
tional versus nonexceptional outcomes. This is done with TaskContinuation 
Options: 


Task task1 = Task.Factory.StartNew (() => { throw null; }); 


Task error = task1.ContinueWith (ant => Console.Write (ant.Exception), 
TaskContinuationOptions.OnlyOnFaulted) ; 


Task ok = task1.ContinueWith (ant => Console.Write ("Success!"), 
TaskContinuationOptions.NotOnFaulted) ; 


This pattern is particularly useful in conjunction with child tasks, as you'll see very 
soon. The following extension method “swallows” a task’s unhandled exceptions: 


public static void IgnoreExceptions (this Task task) 


{ 


task.ContinueWith (t => { var ignore = t.Exception; }, 
TaskContinuationOptions.OnlyOnFaulted) ; 
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(This could be improved by adding code to log the exception.) Here's how it would 
be used: 


Task.Factory.StartNew (() => { throw null; }).IgnoreExceptions(); 


Continuations and child tasks 


A powerful feature of continuations is that they kick off only when all child tasks 
have completed (see Figure 23-5). At that point, any exceptions thrown by the chil- 
dren are marshaled to the continuation. 


In the following example, we start three child tasks, each throwing a Null 
ReferenceException. We then catch all of them in one fell swoop via a continuation 
on the parent: 


TaskCreationOptions atp = TaskCreationOptions.AttachedToParent; 

Task.Factory.StartNew (() => 

{ 
Task.Factory.StartNew (() => { throw null; }, atp); 
Task.Factory.StartNew (() => { throw null; }, atp); 
Task.Factory.StartNew (() => { throw null; }, atp); 

}) 

-ContinueWith (p => Console.WriteLine (p.Exception), 

TaskContinuationOptions.OnlyOnFaulted) ; 
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Figure 23-5. Continuations 





952 | Chapter 23: Parallel Programming 


Conditional continuations 


By default, a continuation is scheduled unconditionally, whether the antecedent 
completes, throws an exception, or is canceled. You can alter this behavior via a set 
of (combinable) flags included within the TaskContinuationOptions enum. Fol- 
lowing are the three core flags that control conditional continuation: 


NotOnRanToCompletion = 0x10000, 

NotOnFaulted = 0x20000, 

NotOnCanceled = 0x40000, 
These flags are subtractive in the sense that the more you apply, the less likely the 
continuation is to execute. For convenience, there are also the following precom- 
bined values: 


OnlyOnRanToCompletion = NotOnFaulted | NotOnCanceled, 
OnlyOnFaulted = NotOnRanToCompletion | NotOnCanceled, 
OnlyOnCanceled = NotOnRanToCompletion | NotOnFaulted 


(Combining all the Not* flags [NotOnRanToCompletion, NotOnFaulted, NotOn 


Canceled] is nonsensical because it would result in the continuation always being 
canceled.) 


RanToCompletion means that the antecedent succeeded without cancellation or 
unhandled exceptions. 


Faulted means that an unhandled exception was thrown on the antecedent. 


Canceled means one of two things: 


e The antecedent was canceled via its cancellation token. In other words, an 
OperationCanceledException was thrown on the antecedent, whose 
CancellationToken property matched that passed to the antecedent when it 
was started. 


e The antecedent was implicitly canceled because it didn’t satisfy a conditional 
continuation predicate. 


It's essential to grasp that when a continuation doesn’t execute by virtue of these 
flags, the continuation is not forgotten or abandoned—it’s canceled. This means that 
any continuations on the continuation itself will then run unless you predicate them 
with NotOnCanceled. For example, consider this: 


Task t1 = Task.Factory.StartNew (...); 


Task fault = t1.ContinueWith (ant => Console.WriteLine ("fault"), 
TaskContinuationOptions.OnlyOnFaulted) ; 


Task t3 = fault.ContinueWith (ant => Console.WriteLine ("t3")); 


As it stands, t3 will always get scheduled—even if t1 doesm’t throw an exception 
(see Figure 23-6). This is because if t1 succeeds, the fault task will be canceled, and 
with no continuation restrictions placed on t3, t3 will then execute unconditionally. 
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Figure 23-6. Conditional continuations 


If we want t3 to execute only if fault actually runs, we must instead do this: 


Task t3 = fault.ContinueWith (ant => Console.WriteLine ("t3"), 
TaskContinuationOptions.NotOnCanceled) ; 
(Alternatively, we could specify OnlyOnRanToCompletion; the difference is that t3 
would not then execute if an exception were thrown within fault.) 


Continuations with multiple antecedents 


You can schedule continuation to execute based on the completion of multiple ante- 
cedents with the ContinueWhenAll and ContinueWhenAny methods in the Task 
Factory class. These methods have become redundant, however, with the 
introduction of the task combinators discussed in Chapter 14 (WhenAlLlL and 
WhenAny). Specifically, given the following tasks: 


var task1 = Task.Run (() => Console.Write ("X")); 
var task2 = Task.Run (() => Console.Write ("Y")); 


we can schedule a continuation to execute when both complete as follows: 


var continuation = Task.Factory.ContinueWhenALl ( 
new[] { task1, task2 }, tasks => Console.WriteLine ("Done")); 


Here’s the same result with the WhenAL1 task combinator: 
var continuation = Task.WhenALlL (taski, task2) 
.ContinueWith (ant => Console.WriteLine ("Done")); 
Multiple continuations on a single antecedent 


Calling ContinueWith more than once on the same task creates multiple continua- 
tions on a single antecedent. When the antecedent finishes, all continuations will 
start together (unless you specify TaskContinuationOptions.Execute 
Synchronous Ly, in which case the continuations will execute sequentially). 


The following waits for one second and then writes either XY or YX: 


var t = Task.Factory.StartNew (() => Thread.Sleep (1000)); 
t.ContinueWith (ant => Console.Write ("X")); 
t.ContinueWith (ant => Console.Write ("Y")); 
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Task Schedulers 


A task scheduler allocates tasks to threads and is represented by the abstract 
TaskScheduler class. .NET Core provides two concrete implementations: the 
default scheduler that works in tandem with the CLR thread pool, and the synchroni- 
zation context scheduler. The latter is designed (primarily) to help you with the 
threading model of WPF and Windows Forms, which requires that user interface 
elements and controls are accessed only from the thread that created them (see 
“Threading in Rich Client Applications” on page 588 in Chapter 14). By capturing 
it, we can instruct a task or a continuation to execute on this context: 


// Suppose we are on a UI thread in a Windows Forms / WPF application: 
_uiScheduler = TaskScheduler.FromCurrentSynchronizationContext() ; 


Assuming Foo is a compute-bound method that returns a string and lblResult is a 
WPE or Windows Forms label, we could then safely update the label after the opera- 
tion completes, as follows: 


Task.Run (() => Foo()) 
.ContinueWith (ant => LblResult.Content = ant.Result, _uiScheduler) ; 
Of course, C#’s asynchronous functions would more commonly be used for this 
kind of thing. It’s also possible to write our own task scheduler (by subclassing Task 
Scheduler), although this is something youd do only in very specialized scenarios. 
For custom scheduling, you'd more commonly use TaskCompletionSource. 


TaskFactory 


When you call Task. Factory, you're calling a static property on Task that returns a 
default TaskFactory object. The purpose of a task factory is to create tasks; specifi- 
cally, three kinds of tasks: 


¢ “Ordinary” tasks (via Star tNew) 


¢ Continuations with multiple antecedents (via ContinueWhenALl and Continue 
WhenAny) 


e Tasks that wrap methods that follow the defunct APM (via FromAsync; see 
“Obsolete Patterns” on page 633 in Chapter 14). 


Another way to create tasks is to instantiate Task and call Start. However, this lets 
you create only “ordinary” tasks, not continuations. 


Creating your own task factories 


TaskFactory is not an abstract factory: you can actually instantiate the class, and 
this is useful when you want to repeatedly create tasks using the same (nonstan- 
dard) values for TaskCreationOptions, TaskContinuationOptions, or Task 
Scheduler. For example, if we want to repeatedly create long-running parented 
tasks, we could create a custom factory as follows: 
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var factory = new TaskFactory ( 
TaskCreationOptions.LongRunning | TaskCreationOptions.AttachedToParent, 
TaskContinuationOptions.None) ; 


Creating tasks is then simply a matter of calling StartNew on the factory: 


Task task1 = factory.StartNew (Method1); 
Task task2 = factory.StartNew (Method2); 


The custom continuation options are applied when calling ContinueWhenAlLl and 
ContinueWhenAny. 


Working with AggregateException 


As we've seen, PLINQ, the Parallel class, and Tasks automatically marshal excep- 
tions to the consumer. To see why this is essential, consider the following LINQ 
query, which throws a DivideByZeroException on the first iteration: 


try 


{ 
var query = from i in Enumerable.Range (0, 1000000) 
select 100 / i; 


catch (DivideByZeroException) 


{ 


7 


If we asked PLINQ to parallelize this query and it ignored the handling of excep- 
tions, a DivideByZeroException would probably be thrown on a separate thread, 
bypassing our catch block and causing the application to die. 


Hence, exceptions are automatically caught and rethrown to the caller. But unfortu- 
nately, it’s not quite as simple as catching a DivideByZeroException. Because these 
libraries utilize many threads, it’s actually possible for two or more exceptions to be 
thrown simultaneously. To ensure that all exceptions are reported, exceptions are 
therefore wrapped in an AggregateException container, which exposes an Inner 
Exceptions property containing each of the caught exception(s): 


try 
{ 
var query = from i in ParallelEnumerable.Range (0, 1000000) 
select 100 / i; 
// Enumerate query 


} 
catch (AggregateException aex) 
{ 
foreach (Exception ex in aex.InnerExceptions) 
Console.WriteLine (ex.Message) ; 
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Both PLINQ and the Parallel class end the query or loop 
execution upon encountering the first exception—by not pro- 
cessing any further elements or loop bodies. More exceptions 
might be thrown, however, before the current cycle is com- 
plete. The first exception in AggregateException is visible in 
the InnerException property. 


Flatten and Handle 


The AggregateException class provides a couple of methods to simplify exception 
handling: Flatten and Handle. 


Flatten 


AggregateExceptions will quite often contain other AggregateExceptions. An 
example of when this might happen is if a child task throws an exception. You can 
eliminate any level of nesting to simplify handling by calling Flatten. This method 
returns a new AggregateException with a simple flat list of inner exceptions: 


catch (AggregateException aex) 


{ 


foreach (Exception ex in aex.Flatten().InnerExceptions) 
myLogWriter.LogException (ex); 


Handle 


Sometimes, it’s useful to catch only specific exception types and have other types 
rethrown. The Handle method on AggregateException provides a shortcut for 
doing this. It accepts an exception predicate, which it runs over every inner 
exception: 


public void Handle (Func<Exception, bool> predicate) 


If the predicate returns true, it considers that exception “handled.” After the dele- 
gate has run over every exception, the following happens: 


If all exceptions were “handled” (the delegate returned true), the exception is 
not rethrown. 


¢ If there were any exceptions for which the delegate returned false (“unhan- 
dled”), a new AggregateException is built up containing those exceptions and 
is rethrown. 


For instance, the following ends up rethrowing another AggregateException that 
contains a single NuLLReferenceException: 


var parent = Task.Factory.StartNew (() => 


{ 


// We'll throw 3 exceptions at once using 3 child tasks: 


int[] numbers = { © }; 
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var childFactory = new TaskFactory 
(TaskCreationOptions.AttachedToParent, TaskContinuationOptions.None); 


childFactory.StartNew (() => 5 / numbers[0]); // Division by zero 
childFactory.StartNew (() => numbers [1]); // Index out of range 
childFactory.StartNew (() => { throw null; }); // Null reference 

}); 


try { parent.Wait(); } 
catch (AggregateException aex) 


{ 
aex.Flatten().Handle (ex => // Note that we still need to call Flatten 
{ 
if (ex is DivideByZeroException) 
{ 
Console.WriteLine ("Divide by zero"); 
return true; // This exception is "handled" 
} 
if (ex is IndexOutOfRangeException) 
{ 
Console.WriteLine ("Index out of range"); 
return true; // This exception is "handled" 
} 
return false; // All other exceptions will get rethrown 
}) 
} 


Concurrent Collections 


.NET Core offers thread-safe collections in the System.Collections.Concurrent 
namespace: 


Concurrent collection Nonconcurrent equivalent 


ConcurrentStack<T> Stack<T> 
ConcurrentQueue<T> Queue<T> 
ConcurrentBag<T> (none) 


ConcurrentDictionary<TKey,TValue> Dictionary<TKey,TValue> 





The concurrent collections are optimized for high-concurrency scenarios; however, 
they can also be useful whenever you need a thread-safe collection (as an alternative 
to locking around an ordinary collection). There are some caveats, though: 


¢ The conventional collections outperform the concurrent collections in all but 
highly concurrent scenarios. 


¢ A thread-safe collection doesn't guarantee that the code using it will be thread- 
safe (see “Locking and Thread Safety” on page 890in Chapter 22). 
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e If you enumerate over a concurrent collection while another thread is modify- 
ing it, no exception is thrown—instead, you get a mixture of old and new 
content. 


e There’s no concurrent version of List<T>. 


e The concurrent stack, queue, and bag classes are implemented internally with 
linked lists. This makes them less memory-efficient than the nonconcurrent 
Stack and Queue classes, but better for concurrent access because linked lists 
are conducive to lock-free or low-lock implementations. (This is because 
inserting a node into a linked list requires updating just a couple of references, 
whereas inserting an element into a List<T>-like structure might require mov- 
ing thousands of existing elements.) 


In other words, these collections are not merely shortcuts for using an ordinary col- 
lection with a lock. To demonstrate, if we execute the following code on a single 
thread: 


var d = new ConcurrentDictionary<int,int>(); 
for (int i = 0; i < 1000000; i++) d[i] = 123; 


it runs three times more slowly than this: 


var d = new Dictionary<int,int>(); 
for (int i = 0; i < 1000000; i++) lock (d) d[i] = 123; 


(Reading from a ConcurrentDictionary, however, is fast because reads are lock- 
free.) 


Je11e4ed 


The concurrent collections also differ from conventional collections in that they 
expose special methods to perform atomic test-and-act operations, such as TryPop. 
Most of these methods are unified via the IProducerConsumerCollection<T> 
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|ProducerConsumerCollection<T> 


A producer/consumer collection is one for which the two primary use cases are: 


e Adding an element (producing) 


e Retrieving an element while removing it (consuming) 


The classic examples are stacks and queues. Producer/consumer collections are sig- 
nificant in parallel programming because they’re conducive to efficient lock-free 
implementations. 


The IProducerConsumerCollection<T> interface represents a thread-safe pro- 
ducer/consumer collection. The following classes implement this interface: 


ConcurrentStack<T> 
ConcurrentQueue<T> 
ConcurrentBag<T> 
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IProducerConsumerCollection<T> extends ICollection, adding the following 
methods: 


void CopyTo (T[] array, int index); 
TL] ToArray(); 

bool TryAdd (T item); 

bool TryTake (out T item); 


The TryAdd and TryTake methods test whether an add/remove operation can be 
performed; if so, they perform the add/remove. The testing and acting are atomi- 
cally performed, eliminating the need to lock as you would around a conventional 
collection: 


int result; 
lock (myStack) if (myStack.Count > @) result = myStack.Pop(); 


TryTake returns false if the collection is empty. TryAdd always succeeds and 
returns true in the three implementations provided. If you wrote your own concur- 
rent collection that prohibited duplicates, however, you'd make TryAdd return false 
if the element already existed (an example would be if you wrote a concurrent sef). 


The particular element that TryTake removes is defined by the subclass: 


e With a stack, TryTake removes the most recently added element. 
e With a queue, TryTake removes the least recently added element. 


¢ With a bag, TryTake removes whatever element it can remove most efficiently. 


The three concrete classes mostly implement the TryTake and TryAdd methods 
explicitly, exposing the same functionality through more specifically named public 
methods such as TryDequeue and TryPop. 


ConcurrentBag<T> 


ConcurrentBag<T> stores an unordered collection of objects (with duplicates per- 
mitted). ConcurrentBag<T> is suitable in situations for which you don’t care which 
element you get when calling Take or TryTake. 


The benefit of ConcurrentBag<T> over a concurrent queue or stack is that a bag’s 
Add method suffers almost no contention when called by many threads at once. In 
contrast, calling Add in parallel on a queue or stack incurs some contention 
(although a lot less than locking around a nonconcurrent collection). Calling Take 
on a concurrent bag is also very efficient—as long as each thread doesn’t take more 
elements than it Added. 


Inside a concurrent bag, each thread gets its own private linked list. Elements are 
added to the private list that belongs to the thread calling Add, eliminating 
contention. When you enumerate over the bag, the enumerator travels through each 
thread's private list, yielding each of its elements in turn. 
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When you call Take, the bag first looks at the current thread’s private list. If there's at 
least one element,' it can complete the task easily and without contention. But if the 
list is empty, it must “steal” an element from another thread’s private list and incur 
the potential for contention. 


So, to be precise, calling Take gives you the element added most recently on that 
thread; if there are no elements on that thread, it gives you the element added most 
recently on another thread, chosen at random. 


Concurrent bags are ideal when the parallel operation on your collection mostly 
comprises Adding elements—or when the Adds and Takes are balanced on a thread. 
We saw an example of the former previously, when using Parallel.ForEach to 
implement a parallel spellchecker: 


var misspellings = new ConcurrentBag<Tuple<int,string>>(); 


Parallel.ForEach (wordsToTest, (word, state, i) => 


{ 


if (!wordLookup.Contains (word) ) 
misspeLllings.Add (Tuple.Create ((int) i, word)); 


}); 


A concurrent bag would be a poor choice for a producer/consumer queue because 
elements are added and removed by different threads. 


BlockingCollection<T> 


If you call TryTake on any of the producer/consumer collections we discussed in the 
previous section, ConcurrentStack<T>, ConcurrentQueue<T>, and Concurrent 
Bag<T>, and the collection is empty, the method returns false. Sometimes, it would 
be more useful in this scenario to wait until an element is available. 


Rather than overloading the TryTake methods with this functionality (which would 
have caused a blowout of members after allowing for cancellation tokens and time- 
outs), PFX’s designers encapsulated this functionality into a wrapper class called 
BlockingCollection<T>. A blocking collection wraps any collection that imple- 
ments IProducerConsumerCollection<T> and lets you Take an element from the 
wrapped collection—blocking if no element is available. 


A blocking collection also lets you limit the total size of the collection, blocking the 
producer if that size is exceeded. A collection limited in this manner is called a boun- 
ded blocking collection. 


To use BlockingCollection<T>: 





1 Due to an implementation detail, there actually needs to be at least two elements to avoid conten- 
tion entirely. 
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1. Instantiate the class, optionally specifying the IProducerConsumer 
Collection<T> to wrap, and the maximum size (bound) of the collection. 


2. Call Add or TryAdd to add elements to the underlying collection. 


3. Call Take or TryTake to remove (consume) elements from the underlying 
collection. 


If you call the constructor without passing in a collection, the class will automati- 
cally instantiate a ConcurrentQueue<T>. The producing and consuming methods let 
you specify cancellation tokens and timeouts. Add and TryAdd may block if the col- 
lection size is bounded; Take and TryTake block while the collection is empty. 


Another way to consume elements is to call GetConsumingEnumerable. This returns 
a (potentially) infinite sequence that yields elements as they become available. You 
can force the sequence to end by calling CompleteAdding: this method also prevents 
further elements from being enqueued. 


BlockingCollection also provides static methods called AddToAny and TakeFrom 
Any, which let you add or take an element while specifying several blocking collec- 
tions. The action is then honored by the first collection able to service the request. 


Writing a Producer/Consumer Queue 


A producer/consumer queue is a useful structure, both in parallel programming 
and general concurrency scenarios. Here's how it works: 


e A queue is set up to describe work items—or data upon which work is 
performed. 


¢ When a task needs executing, it’s enqueued, and the caller gets on with other 
things. 


¢ One or more worker threads plug away in the background, picking off and exe- 
cuting queued items. 


A producer/consumer queue gives you precise control over how many worker 
threads execute at once, which is useful not only in limiting CPU consumption, but 
other resources as well. If the tasks perform intensive disk I/O, for instance, you can 
limit concurrency to avoid starving the operating system and other applications. 
You can also dynamically add and remove workers throughout the queue’s life. The 
CLR’s thread pool itself is a kind of producer/consumer queue, optimized for short- 
running compute-bound jobs. 


A producer/consumer queue typically holds items of data upon which (the same) 
task is performed. For example, the items of data may be filenames, and the task 
might be to encrypt those files. By making the item a delegate, however, you can 
write a more general-purpose producer/consumer queue where each item can do 
anything. 
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Online, we show how to write a producer/consumer queue from scratch using an 
AutoResetEvent (and later, using Monitor’s Wait and Pulse). However, writing a 
producer/consumer from scratch is unnecessary because most of the functionality is 
provided by BlockingCollection<T>. Here’s how to use it: 


public class PCQueue : IDisposable 
{ 


BlockingCollection<Action> _taskQ = new BlockingCollection<Action>(); 


public PCQueue (int workerCount) 


{ 


// Create and start a separate Task for each consumer: 
for (int i = 0; i < workerCount; i++) 
Task.Factory.StartNew (Consume) ; 


} 


public void Enqueue (Action action) { _taskQ.Add (action); } 


void Consume() 


{ 


// This sequence that we're enumerating will block when no elements 
// are available and will end when CompleteAdding is called. 


foreach (Action action in _taskQ.GetConsumingEnumerable() ) 
action(); // Perform task. 
} 


public void Dispose() { _taskQ.CompleteAdding(); } 
} 


Because we didn’t pass anything into BLockingCollection’s constructor, it instanti- 
ated a concurrent queue automatically. Had we passed in a ConcurrentStack, wed 
have ended up with a producer/consumer stack. 
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Using Tasks 


The producer/consumer that we just wrote is inflexible in that we can’t track work 
items after they've been enqueued. It would be nice if we could do the following: 


e Know when a work item has completed (and await it) 
e Cancel a work item 


¢ Deal elegantly with any exceptions thrown by a work item 


An ideal solution would be to have the Enqueue method return some object giving 
us the functionality just described. The good news is that a class already exists to do 
exactly this—the Task class, which we can generate either with a TaskCompletion 
Source, or by instantiating directly (creating an unstarted or cold task): 


public class PCQueue : IDisposable 


{ 
BlockingCollection<Task> _taskQ = new BlockingCollection<Task>(); 
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} 


public PCQueue (int workerCount) 
{ 
// Create and start a separate Task for each consumer: 
for (int i = 0; i < workerCount; i++) 
Task.Factory.StartNew (Consume) ; 


} 


public Task Enqueue (Action action, CancellationToken cancelToken 
= default (CancellationToken) ) 
{ 


var task = new Task (action, cancelToken); 
_taskQ.Add (task); 
return task; 


} 


public Task<TResult> Enqueue<TResult> (Func<TResult> func, 
CancellationToken cancelToken = default (CancellationToken) ) 


{ 
var task = new Task<TResult> (func, cancelToken); 
_taskQ.Add (task); 
return task; 


} 


void Consume() 


{ 
foreach (var task in _taskQ.GetConsumingEnumerable()) 
try 


{ 
if (!task.IsCanceled) task.RunSynchronously(); 


catch (InvalidOperationException) { } // Race condition 


} 


public void Dispose() { _taskQ.CompleteAdding(); } 


In Enqueue, we enqueue and return to the caller a task that we create but dont start. 


In Consume, we run the task synchronously on the consumer’s thread. We catch an 
InvalidOperationException to handle the unlikely event that the task is canceled 
in between checking whether it’s canceled and running it. 


Here’s how we can use this class: 


var pcQ = new PCQueue (2); // Maximum concurrency of 2 
string result = await pcQ.Enqueue (() => "That was easy!"); 


Hence, we have all the benefits of tasks—with exception propagation, return values, 
and cancellation—while taking complete control over scheduling. 
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Span<T> and Memory<T> 








The Span<T> and Memory<T> structs act as low-level facades over an array, string, or 
any contiguous block of managed or unmanaged memory. Their main purpose is to 
help with certain kinds of micro-optimization—in particular, writing low-allocation 
code that minimizes managed memory allocations (thereby reducing the load on 
the garbage collector), without having to duplicate your code for different kinds of 
input. They also enable slicing—working with a portion of an array, string, or mem- 
ory block without creating a copy. 


Span<T> and Memory<T> are particularly useful in performance hotspots, such as the 
ASP.NET Core processing pipeline, or a JSON parser that serves an object database. 


Should you come across these types in an API and not need or 
care for their potential performance advantages: 


e Pass in an array when calling a method that expects a 
Span<T>, ReadOnlySpan<T>, Memory<T> or ReadOnly 
Memory<T> instead; that is, T[]. (This works thanks to 
implicit conversion operators.) 


e Call the ToArray method to convert from a span/ 
memory to an array. And if T is char, ToString will con- 
vert the span/memory into a string. 


Specifically, Span<T> does two things: 


¢ It provides a common array-like interface over managed arrays, strings, and 
pointer-backed memory. This gives you the freedom to employ stack-allocated 
and unmanaged memory to avoid garbage collection, without having to dupli- 
cate code or mess with pointers. 


¢ It allows slicing: exposing reusable subsections of the span without making 
copies. 
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Span<T> comprises just two fields, a pointer and a length. For 
this reason, it can represent only contiguous blocks of mem- 
ory. (Should you need to work with noncontiguous memory, 
the ReadOnlySequences<T> class is available to serve as a linked 
list.) 


Because Span<T> can wrap stack-allocated memory, there are restrictions on how 
you can store or pass around instances (imposed, in part, by Span<T> being a ref 
struct). Memory<T> acts as a span without those restrictions, but it cannot wrap 
stack-allocated memory. Memory<T> still provides the benefit of slicing. 


Each struct comes with a read-only counterpart (ReadOnlySpan<T> and ReadOnly 
Memory<T>). As well as preventing unintentional change, the read-only counterparts 
further improve performance by allowing the compiler and runtime additional free- 
dom for optimization. 


.NET Core itself (and ASP.NET Core) uses these types to improve efficiency with 
1/O, networking, string handling, and JSON parsing. 


Span<T> and Memory<T>’s ability to perform array slicing make 
the old ArraySegment<T> class redundant. To help with any 
transition, there are implicit conversion operators from Array 
Segment<T> to all of the span/memory structs, and from 
Memory<T> and ReadOnlyMemory<T> to ArraySegment<T>. 


Spans and Slicing 


Suppose that you're writing a method to sum an array of integers. A micro- 
optimized implementation would avoid LINQ in favor of a foreach loop: 


int Sum (int[] numbers) 


{ 
int total = 0; 
foreach (int i in numbers) total += i; 
return total; 


} 


Now imagine that you want to sum just a portion of the array. You have two options: 


¢ First copy the portion of the array that you want to sum into another array 

e Add additional parameters (offset and count) 
The first option is inefficient; the second option adds clutter and complexity (which 
worsens with methods that need to accept more than one array). 


Spans solve this nicely. All you need to do is to change the parameter type from 
int[] to ReadOnlySpan<int> (everything else stays the same): 


int Sum (ReadOnlySpan<int> numbers) 


{ 
int total = 0; 
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foreach (int i in numbers) total += i; 
return total; 


} 


We used ReadOnlySpan<T> rather than Span<T> because we 
don't need to modify the array. There’s an implicit conversion 
from Span<T> to ReadOnlySpan<T>, so you can pass a Span<T> 
into a method that expects a ReadOnlySpan<T>. 


We can test this method as follows: 


var numbers = new int [1000]; 
for (int i = 0; i < numbers.Length; i++) numbers [i] = i; 


int total = Sum (numbers); 


We can call Sum with an array because there's an implicit conversion from T[] to 
Span<T> and ReadOnlySpan<T>. Another option is to use the AsSpan extension 
method: 


var span = numbers.AsSpan(); 


The indexer for ReadOnlySpan<T> uses C#’s ref readonly feature to reach directly 
into the underlying data: this allows our method to perform almost as well as the 
original example that used an array. But what we've gained is that we can now “slice” 
the array and sum just a portion of the elements as follows: 


// Sum the middle 500 elements (starting from position 250): 
int total = Sum (numbers.AsSpan (250, 500)); 


If you already have a Span<T> or ReadOnlySpan<T>, you can slice it by calling the 
Slice method: 


Span<int> span = numbers; 
int total = Sum (span.Slice (250, 500)); 


You can also use C# 8’s indices and ranges: 


Span<int> span = numbers; 

Console.WriteLine (span [41]); // Last element 
Console.WriteLine (Sum (span [..10])); // First 10 elements 
Console.WriteLine (Sum (span [100..])); // 100th element to end 
Console.WriteLine (Sum (span [45..])); // Last 5 elements 


Although Span<T> doesn’t implement IEnumerable<T> (it can’t implement inter- 
faces by virtue of being a ref struct), it does implement the pattern, which allows 
C#’s foreach statement to work (see “Enumeration” on page 179 in Chapter 4). 


CopyTo and TryCopyTo 


The CopyTo method copies elements from one span (or Memory<T>) to another. In 
the following example, we copy all of the elements from span x into span y: 
Span<int> x = new[] { 1, 2, 3, 4 }; 
Span<int> y = new int[4]; 
x.CopyTo (y); 
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Slicing makes this method much more useful. In the next example, we copy the first 
half of span x into the second half of span y: 

Span<int> x = new[] { 1, 2, 3, 4 }; 

Span<int> y = new[] { 10, 20, 30, 40 }; 

x[..2].CopyTo (y[2..]); // y is now { 10, 20, 1, 2 } 
If there’s not enough space in the destination to complete the copy, CopyTo throws 
an exception, whereas TryCopyTo returns false (without copying any elements). 


The span structs also expose methods to Clear and Fill the span as well as an 
IndexOf method to search for an element in the span. 


Working with Text 


Spans are designed to work well with strings, which are treated as ReadOnly 
Span<char>. The following method counts whitespace characters: 


int CountWhitespace (ReadOnlySpan<char> s) 
{ 


int count = 0; 
foreach (char c in s) 
if (char.IsWhiteSpace (c)) 
count++; 
return count; 


} 


You can call such a method with a string (thanks to an implicit conversion 
operator): 


int x = CountWhitespace ("Word1 Word2"); // OK 
or with a substring: 
int y = CountWhitespace (someString.AsSpan (20, 10)); 
The ToString() method converts a ReadOnlySpan<char> back to a string. 


Extension methods ensure that some of the commonly used methods on the string 
class are also available to ReadOnlySpan<char>: 


var span = "This ".AsSpan(); // ReadOnlySpan<char> 
Console.WriteLine (span.StartsWith ("This")); // True 
Console.WriteLine (span. Trim().Length); /1 4 


(Note that methods such as StartsWith use ordinal comparison, whereas the corre- 
sponding methods on the string class use culture-sensitive comparison by default.) 


Methods such as ToUpper and ToLower are available, but you must pass in a destina- 
tion span with the correct length (this allows you to decide how and where to allo- 
cate the memory). 


Some of string’s methods are unavailable, such as Split (which splits a string into 
an array of words). It’s actually impossible to write the direct equivalent of string’s 
Split method because you cannot create an array of spans. 
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This is because spans are defined as ref structs, which can exist 
only on the stack. 


(By “exist only on the stack,’ we mean that the struct itself can 
exist only on the stack. The content that the span wraps can— 
and does in this case—exist on the heap.) 


The System.Buffers.Text namespace contains additional types to help you work 
with span-based text, including the following: 


e Utf8Formatter.TryFormat does the equivalent of calling ToString on built-in 
and simple types such as decimal, DateTime, and so on but writes to a span 
instead of a string. 


e Utf8Parser.TryParse does the reverse and parses data from a span into a sim- 
ple type. 
¢ The Baseé4 type provides methods for reading/writing base-64 data. 


Fundamental CLR methods such as int.Parse have also been overloaded to accept 
ReadOnlySpan<char>. 


Memory<T> 


Span<T> and ReadOnlySpan<T> are defined as ref structs to maximize their optimiza- 
tion potential as well as allowing them to work safely with stack-allocated memory 
(as you'll see in the next section). However, it also imposes limitations. In addition 
to being array-unfriendly, you cannot use them as fields in a class (this would put 
them on the heap). This, in turn, prevents them from appearing in lambda expres- 
sions—and as parameters in asynchronous methods, iterators, and asynchronous 
streams: 


async void Foo (Span<int> notAllowed) // Compile-time error! 


(Remember that the compiler processes asynchronous methods and iterators by 
writing a private state machine, which means that any parameters and local variables 
end up as fields. The same applies to lambda expressions that close over variables: 
these also end up as fields in a closure.) 


The Memory<T> and ReadOnlyMemory<T> structs work around this, acting as spans 
that cannot wrap stack-allocated memory, allowing their use in fields, lambda 
expressions, asynchronous methods, and so on. 


You can obtain a Memory<T> or ReadOnlyMemory<T> from an array via an implicit 
conversion or the AsMemory() extension method: 


Memory<int> mem1 = new int[] { 1, 2, 3 }; 

var mem2 = new int[] { 1, 2, 3 }.AsMemory(); 
You can easily convert a Memory<T> or ReadOnlyMemory<T> into a Span<T> or Read 
OnlySpan<T> via its Span property so that you can interact with it as though it were 
a span. The conversion is efficient in that it doesn’t perform any copying: 
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async void Foo (Memory<int> memory) 


{ 


Span<int> span = memory.Span; 


— 


(You can also directly slice a Memory<T> or ReadOnlyMemory<T> via its SLice method 
or a C# range, and access its length via its Length property.) 


Another way to obtain a Memory<T> is to rent it from a pool, 
using the System.Buffers.MemoryPool<T> class. This works 
just like array pooling (see “Array Pooling” on page 541 in 
Chapter 12) and offers another strategy for reducing the load 
on the garbage collector. 


We said in the previous section that you cannot write the direct equivalent of 
string.Split for spans, because you cannot create an array of spans. This limitation 
does not apply to ReadOnlyMemory<char>: 


// Split a string into words: 
TEnumerable<ReadOnlyMemory<char>> Split (ReadOnlyMemory<char> input) 


{ 
int wordStart = 0; 
for (int i = 0; i <= input.Length; i++) 
if (i == input.Length || char.IsWhiteSpace (input.Span [i])) 
{ 


yield return input [wordStart..i]; // Slice with C# range operator 
wordStart = i+ 1; 


} 
I 
This is more efficient than string’s Split method: instead of creating new strings for 
each word, it returns slices of the original string: 


foreach (var slice in Split ("The quick brown fox jumps over the lazy dog")) 


{ 


// slice is a ReadOnlyMemory<char> 


} 


You can easily convert a Memory<T> into a Span<T> (via the 
Span property), but not vice versa. For this reason, it’s better to 
write methods that accept Span<T> than Memory<T> when you 
have a choice. 

For the same reason, it’s better to write methods that accept 
ReadOnlySpan<T> than Span<T>. 


Forward-Only Enumerators 


In the preceding section, we employed ReadOnlyMemory<char> as a solution to 
implementing a string-style Split method. But by giving up on ReadOnly 
Span<char>, we lost the ability to slice spans backed by unmanaged memory. Let’s 
revisit ReadOnlySpan<char> to see whether we can find another solution. 
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One possible option would be to write our Split method so that it returns ranges: 
Range[] Split (ReadOnlySpan<char> input) 


int pos = 0; 
var list = new List<Range>(); 
for (int i = 0; i <= input.Length; i++) 
if (i == input.Length || char.IsWhiteSpace (input [i])) 


list.Add (new Range (pos, i)); 
pos = i+ 1; 
} 
return List.ToArray(); 


} 


The caller could then use those ranges to slice the original span: 


ReadOnlySpan<char> source = "The quick brown fox"; 
foreach (Range range in Split (source)) 


{ 


ReadOnlySpan<char> wordSpan = source [range]; 


— 


This is an improvement, but it’s still imperfect. One of the reasons for using spans in 
the first place is to avoid memory allocations. But notice that our Split method cre- 
ates a List<Range>, adds items to it, and then converts the list into an array. This 
incurs at least two memory allocations as well as a memory-copy operation. 


The solution to this is to eschew the list and array in favor of a forward-only enu- 
merator. An enumerator is clumsier to work with, but it can be made allocation-free 
with the use of structs: 


// We must define this as a ref struct, because _input is a ref struct. 
public readonly ref struct CharSpanSplitter 
{ 
readonly ReadOnlySpan<char> _input; 
public CharSpanSplitter (ReadOnlySpan<char> input) => _input = input; 
public Enumerator GetEnumerator() => new Enumerator (_input); 


public ref struct Enumerator // Forward-only enumerator 
{ 

readonly ReadOnlySpan<char> _input; 

int _wordPos; 

public ReadOnlySpan<char> Current { get; private set; } 


public Rator (ReadOnlySpan<char> input) 
{ 

_input = input; 

_wordPos = 0; 

Current = default; 
} 


public bool MoveNext() 
{ 
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for (int i = _wordPos; i <= _input.Length; i++) 
if (i == _input.Length || char.IsWhiteSpace (_input [i])) 
{ 
Current = _input [_wordPos..i]; 
_wordPos = i + 1; 
return true; 


} 


return false; 
} 
} 
} 


public static class CharSpanExtensions 


{ 
public static CharSpanSplitter Split (this ReadOnlySpan<char> input) 


=> new CharSpanSplitter (input); 


public static CharSpanSplitter Split (this Span<char> input) 
=> new CharSpanSplitter (input); 


} 


Here’s how you would call it: 


var span = "the quick brown fox".AsSpan(); 
foreach (var word in span.Split()) 


// word is a ReadOnlySpan<char> 
} 


By defining a Current property and a MoveNext method, our enumerator can work 
with C#’s foreach statement (see “Enumeration” on page 179 in Chapter 4). We 
don't have to implement the IEnumerable<T>/IEnumerator<T> interfaces (in fact, 
we cant; ref structs can’t implement interfaces). We're sacrificing abstraction for 
micro-optimization. 


Working with Stack-Allocated and Unmanaged Memory 


Another effective micro-optimization technique is to reduce the load on the garbage 
collector by minimizing heap-based allocations. This means making greater use of 
stack-based memory—or even unmanaged memory. 


Unfortunately, this normally requires that you rewrite code to use pointers. In the 
case of our previous example that summed elements in an array, we would need to 
write another version: 


unsafe int Sum (int* numbers, int Length) 

{ 
int total = 0; 
for (int i = 0; i < length; i++) total += numbers [i]; 
return total; 


} 
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so that we could do this: 


int* numbers = stackalloc int [1000]; // Allocate array on the stack 
int total = Sum (numbers, 1000); 


Spans solve this problem: you can construct a Span<T> or ReadOnlySpan<T> directly 
from a pointer: 


int* numbers = stackalloc int [1000]; 
var span = new Span<int> (numbers, 1000); 


Or in one step: 
Span<int> numbers = stackalloc int [1000]; 


(Note that this doesn’t require the use of unsafe.) Recall the Sum method that we 
wrote previously: 


int Sum (ReadOnlySpan<int> numbers) 


{ 
int total = 0; 
int len = numbers.Length; 
for (int i = 0; i < len; i++) total += numbers [i]; 
return total; 


} 


This method works equally well for a stack-allocated span. We have gained on three 
counts: 


e The same method works with both arrays and stack-allocated memory. 
e We can use stack-allocated memory with minimal use of pointers. 


e The span can be sliced. 


The compiler is smart enough to prevent you from writing a 
method that allocates memory on the stack and returns it to 
the caller via a Span<T> or ReadOnlySpan<T>. 


(In other scenarios, however, you can legally return a Span<T> 
or ReadOnlySpan<T>.) 


You can also use spans to wrap memory that you allocate from the unmanaged 
heap. In the following example, we allocate unmanaged memory using the 
Marshal.AllocHGlobal function, wrap it in a Span<char>, and then copy a string 
into the unmanaged memory. Finally, we employ the CharSpanSplitter struct that 
we wrote in the preceding section to split the unmanaged string into words: 


var source = "The quick brown fox".AsSpan(); 

var ptr = Marshal.AllocHGlobal (source.Length * sizeof (char)); 
try 

{ 


var unmanaged = new Span<char> ((char*)ptr, source.Length); 
source.CopyTo (unmanaged) ; 
foreach (var word in unmanaged. Split()) 

Console.WriteLine (word.ToString()); 
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i 

finally { Marshal.FreeHGlobal (ptr); } 
A nice bonus is that Span<T>’s indexer performs bounds-checking, preventing a 
buffer overrun. This protection applies if you correctly instantiate Span<T>: in our 
example, you would lose this protection if you wrongly obtained the span: 


var span = new Span<char> ((char*)ptr, source.Length * 2); 


There’s also no protection from the equivalent of a dangling pointer, so you must 
take care not to access the span after releasing its unmanaged memory with 
Marshal. FreeHGlobal. 





974 | Chapter 24: Span<T> and Memory<T> 


25 


Native and COM Interoperability 








This chapter describes how to integrate with native (unmanaged) Dynamic-Link 
Libraries (DLLs) and Component Object Model (COM) components. Unless other- 
wise stated, the types mentioned in this chapter exist in either the System or the 
System.Runtime. InteropServices namespace. 


Calling into Native DLLs 


P/Invoke, short for Platform Invocation Services, allows you to access functions, 
structs, and callbacks in unmanaged DLLs (shared libraries on Unix). 


For example, consider the MessageBox function, defined in the Windows DLL 
user32.dll as follows: 


int MessageBox (HWND hWnd, LPCTSTR LpText, LPCTSTR lpCaption, UINT uType); 


You can call this function directly by declaring a static method of the same name, 
applying the extern keyword, and adding the DllImport attribute: 


using System; 
using System.Runtime.InteropServices; 


class MsgBoxTest 
{ 
[DllLImport("user32.d1l")] 
static extern int MessageBox (IntPtr hWnd, string text, string caption, 
int type); 
public static void Main() 
{ 
MessageBox (IntPtr.Zero, 
"Please do not press this again.", "Attention", 0); 
} 
} 


The MessageBox classes in the System.Windows and System.Windows.Forms name- 
spaces themselves call similar unmanaged methods. 
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Here's a DLLImport example for Ubuntu Linux: 


[DLLImport("libc")] 
public static extern uint getuid(); 


public static void PrintUserID() 


{ 
Console.WriteLine ($"User ID: {getuid()}"); 


} 


The CLR includes a marshaler that knows how to convert parameters and return 
values between .NET types and unmanaged types. In the Windows example, the int 
parameters translate directly to four-byte integers that the function expects, and the 
string parameters are converted into null-terminated arrays of Unicode characters 
(encoded in UTF-16). IntPtr is a struct designed to encapsulate an unmanaged 
handle; it’s 32 bits wide on 32-bit platforms and 64 bits wide on 64-bit platforms. A 
similar translation happens on Unix. 


Type Marshaling 


Marshaling Common Types 


On the unmanaged side, there can be more than one way to represent a given data 
type. A string, for instance, can contain single-byte ANSI characters or UTF-16 
Unicode characters, and can be length prefixed, null terminated, or of fixed length. 
With the MarshalAs attribute, you can specify to the CLR marshaler the variation in 
use, so it can provide the correct translation. Here’s an example: 


[DllLImport("...")] 
static extern int Foo ( [MarshalAs (UnmanagedType.LPStr)] string s ); 


The UnmanagedType enumeration includes all the Win32 and COM types that the 
marshaler understands. In this case, the marshaler was told to translate to LPStr, 
which is a null-terminated single-byte ANSI string. 


On the .NET side, you also have some choice as to what data type to use. Unman- 
aged handles, for instance, can map to IntPtr, int, uint, Long, or ulong. 


Most unmanaged handles encapsulate an address or pointer 
and so must be mapped to IntPtr for compatibility with both 
32- and 64-bit operating systems. A typical example is 
HWND. 


Quite often with Win32 and POSIX functions, you come across an integer parame- 
ter that accepts a set of constants, defined in a C++ header file such as WinUser.h. 
Rather than defining these as simple C# constants, you can define them within an 
enum, instead. Using an enum can make for tidier code as well as increase static 
type safety. We provide an example in “Shared Memory” on page 982. 
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When installing Microsoft Visual Studio, be sure to install the 
C++ header files—even if you choose nothing else in the C++ 
category. This is where all the native Win32 constants are 
defined. You can then locate all header files by searching for 
*h in the Visual Studio program directory. 


On Unix, the POSIX standard defines names of constants, but 
individual implementations of POSIX-compliant Unix sys- 
tems may assign different numeric values to these constants. 
You must use the correct numeric value for your operating 
system of choice. Similarly, POSIX defines a standard for 
structs used in interop calls. The ordering of fields in the 
struct is not fixed by the standard, and a Unix implementation 
might add additional fields. C++ header files defining func- 
tions and types are often installed in /usr/include or /usr/local/ 
include. 


Receiving strings from unmanaged code back to .NET requires that some memory 
management take place. The marshaler automatically performs this work if you 
declare the external method with a StringBuilder rather than a string, as follows: 


[DLLImport("kernel32.d1lL") ] 
static extern int GetWindowsDirectory (StringBuilder sb, int maxChars); 


static void Main() 


{ 
StringBuilder s = new StringBuilder (256); 


GetWindowsDirectory (s, 256); 
Console.WriteLine (s); 


} 


On Unix, it works similarly. The following calls getcwd to return the current 
directory: 


[DlLImport("libc")] 
private static extern string getcwd (StringBuilder buf, int size); 


var sb = new StringBuilder (256); 
Console.WriteLine (getcwd (sb, sb.Capacity)); 


Although StringBuilder is convenient to use, it’s somewhat inefficient in that the 
CLR must perform additional memory allocations and copying. In performance 
hotspots, you can avoid this overhead by using char[] instead: 
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[DllImport ("kernel32.d1l", CharSet = CharSet.Unicode) | 
static extern int GetWindowsDirectory (char[] buffer, int maxChars); 


Notice that you must specify a CharSet in the DllImport attribute. You must also 
trim the output string to length after calling the function. You can achieve this while 
minimizing memory allocations with the use of array pooling (see “Array Pooling” 
on page 541 in Chapter 12), as follows: 


string GetWindowsDirectory() 


{ 


var array = ArrayPool<char>.Shared.Rent (256); 
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try 
{ 


int length = GetWindowsDirectory (array, 256); 
return new string (array, 0, Length).ToString(); 


} 


finally { ArrayPool<char>.Shared.Return (array); } 


} 


(Of course, this example is contrived in that you can obtain the Windows directory 
via the built-in Environment.GetFolderPath method.) 


If you are unsure how to call a particular Win32 or Unix 
method, you will usually find an example on the internet if 
you search for the method name and DilImport. For Windows, 
there is a wiki that aims to document all Win32 signatures. 


Marshaling Classes and Structs 


Sometimes, you need to pass a struct to an unmanaged method. For example, 
GetSystemTime in the Win32 API is defined as follows: 


void GetSystemTime (LPSYSTEMTIME LpSystemTime) ; 


LPSYSTEMTIME conforms to this C struct: 


typedef struct _SYSTEMTIME { 


WORD 
WORD 
WORD 
WORD 
WORD 
WORD 
WORD 
WORD 


wYear; 

wMonth; 
wDayOfWeek; 
wDay; 

wHour ; 
wMinute; 
wSecond; 
wMilLiseconds; 


} SYSTEMTIME, *PSYSTEMTIME; 


To call GetSystemTime, we must define a .NET class or struct that matches this C 


struct: 


using System; 
using System.Runtime. InteropServices; 


[StructLayout(LayoutKind. Sequential) ] 
class SystemTime 


{ 


public ushort Year; 

public ushort Month; 

public ushort DayOfWeek; 
public ushort Day; 

public ushort Hour; 

public ushort Minute; 
public ushort Second; 
public ushort Milliseconds; 
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The StructLayout attribute instructs the marshaler how to map each field to its 
unmanaged counterpart. LayoutKind.Sequential means that we want the fields 
aligned sequentially on pack-size boundaries (you'll see what this means shortly), 
just as they would be in a C struct. The field names here are irrelevant; it’s the order- 
ing of fields that’s important. 


Now we can call GetSystemTime: 


[DLLImport("kernel32.d1lL") ] 
static extern void GetSystemTime (SystemTime t); 


static void Main() 


{ 


SystemTime t = new SystemTime(); 
GetSystemTime (t); 
Console.WriteLine (t.Year); 


i 
Similarly, on Unix: 


[StructLayout(LayoutKind. Sequential) ] 
struct Timespec 


{ 


public long tv_sec; /* seconds */ 
public long tv_nsec; /* nanoseconds */ 


} 


[DLLImport("libc")] 
private static extern int clock_gettime (int clk_id, ref Timespec tp); 


static DateTime startOfUnixTime = 
new DateTime(1970, 1, 1, 0, 0, 0, 0, System.DateTimeKind.Utc) ; 


static void Main() => Console.WriteLine (GetSystemTime()); 


static DateTime GetSystemTime() 
{ 


Timespec tp = new Timespec(); 

int success = clock_gettime (0, ref tp); 

if (success != 0) throw new Exception ("Error checking the time."); 
return startOfUnixTime.AddSeconds (tp.tv_sec).ToLocalTime() ; 


i 
In both C and Cz#, fields in an object are located at n number of bytes from the 
address of that object. The difference is that in a C# program, the CLR finds this 
offset by looking it up using the field token; C field names are compiled directly into 
offsets. For instance, in C, wDay is just a token to represent whatever is at the address 
of a SystemTime instance plus 24 bytes. 


For access speed, each field is placed at an offset that is a multiple of the field’s size. 
That multiplier, however, is restricted to a maximum of x bytes, where x is the pack 
size. In the current implementation, the default pack size is 8 bytes, so a struct com- 
prising a sbyte followed by an (8-byte) Long occupies 16 bytes, and the 7 bytes fol- 
lowing the sbyte are wasted. You can lessen or eliminate this wastage by specifying 
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a pack size via the Pack property of the StructLayout attribute: this makes the fields 
align to offsets that are multiples of the specified pack size. So, with a pack size of 
one, the struct just described would occupy just nine bytes. You can specify pack 
sizes of 1, 2, 4, 8, or 16 bytes. 


The StructLayout attribute also lets you specify explicit field offsets (see “Simulat- 
ing a C Union” on page 981). 


In and Out Marshaling 


In the previous example, we implemented SystemTime as a class. We could have 
instead chosen a struct—provided that GetSystemTime was declared with a ref or 
out parameter: 


[DLLImport("kernel32.d1L") ] 

static extern void GetSystemTime (out SystemTime t); 
In most cases, C#’s directional parameter semantics work the same with external 
methods. Pass-by-value parameters are copied in, C# ref parameters are copied in/ 
out, and C# out parameters are copied out. However, there are some exceptions for 
types that have special conversions. For instance, array classes and the String 
Builder class require copying when coming out of a function, so they are in/out. It 
is occasionally useful to override this behavior, with the In and Out attributes. For 
example, if an array should be read-only, the in modifier indicates to copy only the 
array going into the function, not coming out of it: 


static extern void Foo ( [In] int[] array); 


Callbacks from Unmanaged Code 


The P/Invoke layer does its best to present a natural programming model on both 
sides of the boundary, mapping between relevant constructs where possible. 
Because C# not only can call out to C functions, but also can be called back from the 
C functions (via function pointers), the P/Invoke layer maps unmanaged function 
pointers into the nearest equivalent in C#, which is delegates. 


As an example, you can enumerate all top-level window handles with this method 
in User32. dll: 


BOOL EnumWindows (WNDENUMPROC LpEnumFunc, LPARAM lParam); 


WNDENUMPROC is a callback that is fired with the handle of each window in sequence 
(or until the callback returns false). Here is its definition: 


BOOL CALLBACK EnumWindowsProc (HWND hwnd, LPARAM lParam); 


To use this, we declare a delegate with a matching signature and then pass a delegate 
instance to the external method: 


using System; 
using System.Runtime. InteropServices; 
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class CallbackFun 


{ 
delegate bool EnumWindowsCallback (IntPtr hWnd, IntPtr lParam); 


[DllImport("user32.d1l")] 
static extern int EnumWindows (EnumWindowsCallback hWnd, IntPtr lParam); 


static bool PrintWindow (IntPtr hWnd, IntPtr lParam) 


{ 
Console.WriteLine (hWnd. ToInt64()); 


return true; 


} 


static void Main() => EnumWindows (PrintWindow, IntPtr.Zero); 


} 


Simulating a C Union 


Each field in a struct is given enough room to store its data. Consider a struct 
containing one int and one char. The int is likely to start at an offset of 0 and is 
guaranteed at least four bytes. So, the char would start at an offset of at least 4. If, for 
some reason, the char started at an offset of 2, youd change the value of the int if 
you assigned a value to the char. Sounds like mayhem, doesn't it? Strangely enough, 
the C language supports a variation on a struct called a union that does exactly this. 
You can simulate this in C# by using LayoutKind.Explicit and the FieldOffset 
attribute. 


It might be challenging to think of a case in which this would be useful. However, 
suppose that you want to play a note on an external synthesizer. The Windows Mul- 
timedia API provides a function for doing just this via the MIDI protocol: 


[DlLImport ("winmm.dll") ] 
public static extern uint midiOutShortMsg (IntPtr handle, uint message); 


The second argument, message, describes what note to play. The problem is in con- 
structing this 32-bit unsigned integer: it’s divided internally into bytes, representing 
a MIDI channel, note, and velocity at which to strike. One solution is to shift and 
mask via the bitwise <<, >>, & and | operators to convert these bytes to and from the 
32-bit “packed” message. Far simpler, though, is to define a struct with explicit 
layout: 


[StructLayout (LayoutKind.Explicit) ] 
public struct NoteMessage 


{ 
[FieldOffset(®)] public uint PackedMsg; // 4 bytes long 


[FieldOffset(@)] public byte Channel; // Field0ffset also at 0 
[FieldOffset(1)] public byte Note; 
[FieldOffset(2)] public byte Velocity; 
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The Channel, Note, and Velocity fields deliberately overlap with the 32-bit packed 
message. This allows you to read and write using either. No calculations are 
required to keep other fields in sync: 


NoteMessage n = new NoteMessage(); 
Console.WriteLine (n.PackedMsg) ; // 0 


n.Channel = 10; 

n.Note = 100; 

n.Velocity = 50; 

Console.WriteLine (n.PackedMsg); // 3302410 


n.PackedMsg = 3328010; 
Console.WriteLine (n.Note); // 200 


Shared Memory 


Memory-mapped files, or shared memory, is a feature in Windows that allows multi- 
ple processes on the same computer to share data. Shared memory is extremely fast 
and, unlike pipes, offers random access to the shared data. We saw in Chapter 15 
how you can use the MemoryMappedFile class to access memory-mapped files; 
bypassing this and calling the Win32 methods directly is a good way to demonstrate 
P/Invoke. 


The Win32 CreateFileMapping function allocates shared memory. You tell it how 
many bytes you need and the name with which to identify the share. Another appli- 
cation can then subscribe to this memory by calling OpenFileMapping with same 
name. Both methods return a handle, which you can convert to a pointer by calling 
MapViewOf File. Here’s a class that encapsulates access to shared memory: 


using System; 
using System.Runtime.InteropServices; 
using System.ComponentModel; 


public sealed class SharedMem : IDisposable 


{ 


// Here we're using enums because they're safer than constants 


enum FileProtection : uint // constants from winnt.h 
{ 

ReadOnly = 2, 

ReadwWrite = 4 
} 
enum FileRights : uint // constants from WinBASE.h 
{ 

Read = 4, 

Write = 2, 

ReadWrite = Read + Write 
} 


static readonly IntPtr NoFileHandle = new IntPtr (-1); 
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} 


[DllImport ("kernel32.dll", SetLastError = true) ] 

static extern IntPtr CreateFileMapping (IntPtr hFile, 
int lpAttributes, 
FileProtection flProtect, 
uint dwMaximumSizeHigh, 
uint dwMaximumSizeLow, 
string LpName); 


[DllImport ("kernel32.d1ll", SetLastError=true) ] 

static extern IntPtr OpenFileMapping (FileRights dwDesiredAccess, 
bool bInheritHandle, 
string LpName) ; 


[DllImport ("kernel32.d1ll", SetLastError = true) ] 

static extern IntPtr MapViewOfFile (IntPtr hFileMappingObject, 
FileRights dwDesiredAccess, 
uint dwFileOffsetHigh, 
uint dwFileOffsetLow, 
uint dwNumberOfBytesToMap) ; 


[DllImport ("Kernel32.d1ll", SetLastError = true) ] 
static extern bool UnmapViewOfFile (IntPtr map); 


[DllImport ("kernel32.d1ll", SetLastError = true) ] 
static extern int CloseHandle (IntPtr hObject); 


IntPtr fileHandle, fileMap; 
public IntPtr Root { get { return fileMap; } } 


public SharedMem (string name, bool existing, uint sizeInBytes) 
is 
if (existing) 
fileHandle = OpenFileMapping (FileRights.ReadWrite, false, name); 
else 
fileHandle = CreateFileMapping (NoFileHandle, 0, 
FileProtection.ReadwWrite, 
0, sizeInBytes, name); 
if (fileHandle == IntPtr.Zero) 
throw new Win32Exception(); 


// Obtain a read/write map for the entire file 
fileMap = MapViewOfFile (fileHandle, FileRights.ReadwWrite, 0, 0, 0); 


if (fileMap == IntPtr.Zero) 
throw new Win32Exception(); 


} 

public void Dispose() 

{ 
if (fileMap != IntPtr.Zero) UnmapViewOfFile (fileMap) ; 
if (fileHandle != IntPtr.Zero) CloseHandle (fileHandle); 
fileMap = fileHandle = IntPtr.Zero; 

} 
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In this example, we set SetLastError=true on the DLLImport methods that use the 
SetLastError protocol for emitting error codes. This ensures that the Win32 
Exception is populated with details of the error when that exception is thrown. (It 
also allows you to query the error explicitly by calling Marshal.GetLast 

Win32Error.) 


To demonstrate this class, we need to run two applications. The first one creates the 
shared memory, as follows: 


using (SharedMem sm = new SharedMem ("MyShare", false, 1000)) 
{ 


IntPtr root = sm.Root; 
// I have shared memory! 


Console.ReadLine(); // Here's where we start a second app... 


} 


The second application subscribes to the shared memory by constructing a Shared 
Mem object of the same name, with the existing argument true: 


using (SharedMem sm = new SharedMem ("MyShare", true, 1000)) 
{ 

IntPtr root = sm.Root; 

// I have the same shared memory! 

Ud sas 
} 


The net result is that each program has an IntPtr—a pointer to the same unman- 
aged memory. The two applications now need somehow to read and write to mem- 
ory via this common pointer. One approach is to write a serializable class that 
encapsulates all the shared data and then serialize (and deserialize) the data to the 
unmanaged memory using an UnmanagedMemoryStream. This is inefficient, however, 
if there's a lot of data. Imagine if the shared memory class had a megabyte of data, 
and just one integer needed to be updated. A better approach is to define the shared 
data construct as a struct, and then map it directly into shared memory. We discuss 
this in the following section. 


Mapping a Struct to Unmanaged Memory 


You can directly map a struct with a StructLayout of Sequential or Explicit into 
unmanaged memory. Consider the following struct: 


[StructLayout (LayoutKind. Sequential) ] 
unsafe struct MySharedData 


{ 
public int Value; 
public char Letter; 
public fixed float Numbers [50]; 


} 


The fixed directive allows us to define fixed-length value-type arrays inline, and it 
is what takes us into the unsafe realm. Space in this struct is allocated inline for 50 
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floating-point numbers. Unlike with standard C# arrays, Numbers is not a reference 
to an array—it is the array. If we run the following: 


static unsafe void Main() => Console.WriteLine (sizeof (MySharedData) ); 


the result is 208: 50 four-byte floats, plus the four bytes for the Value integer, plus 
two bytes for the Letter character. The total, 206, is rounded to 208 due to the 
floats being aligned on four-byte boundaries (four bytes being the size of a float). 


We can demonstrate MySharedData in an unsafe context, most simply, with stack- 
allocated memory: 


MySharedData d; 
MySharedData* data = &d; // Get the address of d 


data->Value = 123; 
data->Letter = 'X'; 
data->Numbers[10] = 1.45f; 


or: 


// Allocate the array on the stack: 
MySharedData* data = stackalloc MySharedData[1]; 


data->Value = 123; 

data->Letter = 'X'; 

data->Numbers[10] = 1.45f; 
Of course, we're not demonstrating anything that couldn't otherwise be achieved in 
a managed context. Suppose, however, that we want to store an instance of 
MySharedData on the unmanaged heap, outside the realm of the CLR’s garbage col- 
lector. This is where pointers become really useful: 


MySharedData* data = (MySharedData*) 
Marshal.AllocHGlobal (sizeof (MySharedData)).ToPointer(); 


data->Value = 123; 

data->Letter = 'X'; 

data->Numbers[10] = 1.45f; 
Marshal.AllocHGlobal allocates memory on the unmanaged heap. Here’s how to 
later free the same memory: 


Marshal.FreeHGlobal (new IntPtr (data)); 


(The result of forgetting to free the memory is a good old-fashioned memory leak.) 


In keeping with its name, here we use MySharedData in conjunction with the Shared 
Mem class we wrote in the preceding section. The following program allocates a block 
of shared memory, and then maps the MySharedData struct into that memory: 


static unsafe void Main() 


{ 


using (SharedMem sm = new SharedMem ("MyShare", false, 
(uint) sizeof (MySharedData) )) 
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void* root = sm.Root.ToPointer(); 
MySharedData* data = (MySharedData*) root; 


data->Value = 123; 

data->Letter = 'X'; 

data->Numbers[10] = 1.45f; 

Console.WriteLine ("Written to shared memory"); 


Console.ReadLine(); 

Console.WriteLine ("Value is " + data->Value); 
Console.WriteLine ("Letter is " + data->Letter); 
Console.WriteLine ("11th Number is " + data->Numbers[10]); 
Console.ReadLine(); 


You can use the built-in MemoryMappedFile class instead of 
SharedMen, as follows: 


using (MemoryMappedFile mmFile = 
MemoryMappedFile.CreateNew ("MyShare", 1000) ) 
using (MemoryMappedViewAccessor accessor = 
mmFile.CreateViewAccessor()) 
{ 
byte* pointer = null; 
accessor .SafeMemor yMappedViewHandle.AcquirePointer 
(ref pointer); 
void* root = pointer; 


; Biss 


Here’s a second program that attaches to the same shared memory, reading the val- 
ues written by the first program (it must be run while the first program is waiting 
on the ReadLine statement because the shared memory object is disposed upon 
leaving its using statement): 


static unsafe void Main() 


{ 


using (SharedMem sm = new SharedMem ("MyShare", true, 


{ 


(uint) sizeof (MySharedData) )) 


void* root = sm.Root.ToPointer(); 
MySharedData* data = (MySharedData*) root; 
Console.WriteLine ("Value is " + data->Value); 
Console.WriteLine ("Letter is " + data->Letter); 
Console.WriteLine ("11th Number is " + data->Numbers[10]); 


// Our turn to update values in shared memory! 
data->VaLluet++; 

data->Letter = '!'; 

data->Numbers[10] = 987.5f; 

Console.WriteLine ("Updated shared memory"); 
Console.ReadLine(); 
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i 
} 


The output from each of these programs is as follows: 


// First program: 


Written to shared memory 
Value is 124 

Letter is ! 

11th Number is 987.5 


// Second program: 


Value is 123 

Letter is X 

11th Number is 1.45 
Updated shared memory 


Dont be put off by the pointers: C++ programmers use them throughout whole 
applications and are able to get everything working. At least most of the time! This 
sort of usage is fairly simple by comparison. 


As it happens, our example is unsafe—quite literally—for another reason. We've not 
considered the thread-safety (or more precisely, process-safety) issues that arise 
with two programs accessing the same memory at once. To use this in a production 
application, wed need to add the volatile keyword to the Value and Letter fields 
in the MySharedData struct to prevent fields from being cached by the Just-In-Time 
(JIT) compiler (or by the hardware in CPU registers). Furthermore, as our interac- 
tion with the fields grew beyond the trivial, we would most likely need to protect 
their access via a cross-process Mutex, just as we would use lock statements to pro- 
tect access to fields in a multithreaded program. We discussed thread safety in detail 
in Chapter 22. 


fixed and fixed {...} 


One limitation of mapping structs directly into memory is that the struct can con- 
tain only unmanaged types. If you need to share string data, for instance, you must 
use a fixed-character array instead. This means manual conversion to and from the 
string type. Here’s how to do it: 


[StructLayout (LayoutKind.SequentialL) ] 
unsafe struct MySharedData 


{ 


// Allocate space for 200 chars (i.e., 400 bytes). 
const int MessageSize = 200; 
fixed char message [MessageSize]; 


// One would most likely put this code into a helper class: 
public string Message 


{ 


get { fixed (char* cp = message) return new string (cp); } 
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set 


{ 
fixed (char* cp = message) 
{ 
int i = 0; 
for (; i < value.Length && i < MessageSize - 1; i++) 
cp [i] = value [i]; 


// Add the null terminator 
cp [i] = '\0'; 
} 
} 
} 


There's no such thing as a reference to a fixed array; instead, 
you get a pointer. When you index into a fixed array, you're 
actually performing pointer arithmetic! 


With the first use of the fixed keyword, we allocate space, inline, for 200 characters 
in the struct. The same keyword (somewhat confusingly) has a different meaning 
when used later in the property definition. It instructs the CLR to pin an object so 
that should it decide to perform a garbage collection inside the fixed block, not to 
move the underlying struct about on the memory heap (because its contents are 
being iterated via direct memory pointers). Looking at our program, you might 
wonder how MySharedData could ever shift in memory, given that it resides not on 
the heap, but in the unmanaged world, where the garbage collector has no jurisdic- 
tion. The compiler doesn’t know this, however, and is concerned that we might use 
MySharedData in a managed context, so it insists that we add the fixed keyword to 
make our unsafe code safe in managed contexts. And the compiler does have a 
point—here’s all it would take to put MySharedData on the heap: 


object obj = new MySharedData(); 


This results in a boxed MySharedData—on the heap and eligible for transit during 
garbage collection. 


This example illustrates how a string can be represented in a struct mapped to 
unmanaged memory. For more complex types, you also have the option of using 
existing serialization code. The one proviso is that the serialized data must never 
exceed, in length, its allocation of space in the struct; otherwise, the result is an 
unintended union with subsequent fields. 


COM Interoperability 


The .NET runtime provides special support for COM, enabling COM objects to be 
used from .NET, and vice versa. COM is available only on Windows. 
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The Purpose of COM 


COM is an acronym for Component Object Model, a binary standard for interfac- 
ing with libraries, released by Microsoft in 1993. The motivation for inventing COM 
was to enable components to communicate with one another in a language- 
independent and version-tolerant manner. Before COM, the approach in Windows 
was to publish DLLs that declared structures and functions using the C program- 
ming language. Not only is this approach language specific, but it’s also brittle. The 
specification of a type in such a library is inseparable from its implementation: even 
updating a structure with a new field means breaking its specification. 


The beauty of COM was to separate the specification of a type from its underlying 
implementation through a construct known as a COM interface. COM also allowed 
for the calling of methods on stateful objects—rather than being limited to simple 
procedure calls. 


In a way, the .NET programming model is an evolution of the 
principles of COM programming: the .NET platform also 
facilitates cross-language development and allows binary 
components to evolve without breaking applications that 
depend on them. 


The Basics of the COM Type System 


The COM type system revolves around interfaces. A COM interface is rather like 
a .NET interface, but it’s more prevalent because a COM type exposes its functional- 
ity only through an interface. In the .NET world, for instance, we could declare a 
type simply, as follows: 


public class Foo 


{ 
public string Test() => "Hello, world"; 


} 


Consumers of that type can use Foo directly. And if we later changed the implemen- 
tation of Test(), calling assemblies would not require recompilation. In this 
respect, .NET separates interface from implementation—without requiring inter- 
faces. We could even add an overload without breaking callers: 


public string Test (string s) => $"Hello, world {s}"; 


In the COM world, Foo exposes its functionality through an interface to achieve this 
same decoupling. So, in Foo’s type library, an interface such as this would exist: 


public interface IFoo { string Test(); } 


(We've illustrated this by showing a C# interface—not a COM interface. The princi- 
ple, however, is the same—although the plumbing is different.) 


Callers would then interact with IFoo rather than Foo. 


When it comes to adding the overloaded version of Test, life is more complicated 
with COM than with .NET. First, we would avoid modifying the IFoo interface 
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because this would break binary compatibility with the previous version (one of the 
principles of COM is that interfaces, once published, are immutable). Second, COM 
doesn't allow method overloading. The solution is to instead have Foo implement a 
second interface: 


public interface IFoo2 { string Test (string s); } 


(Again, we've transliterated this into a .NET interface for familiarity.) 


Supporting multiple interfaces is of key importance in making COM libraries 
versionable. 


Unknown and IDispatch 
All COM interfaces are identified with a Globally Unique Identifier (GUID). 


The root interface in COM is IUnknown—all COM objects must implement it. This 
interface has three methods: 


e AddRef 
e Release 


e QueryInterface 


AddRef and Release are for lifetime management given that COM uses reference 
counting rather than automatic garbage collection (COM was designed to work 
with unmanaged code, where automatic garbage collection isn’t feasible). The Query 
Interface method returns an object reference that supports that interface, if it can 
do so. 


To enable dynamic programming (e.g., scripting and automation), a COM object 
can also implement IDispatch. This enables dynamic languages such as VBScript to 
call COM objects in a late-bound manner—rather like dynamic in C# (although 
only for simple invocations). 


Calling a COM Component from C# 


The CLR’s built-in support for COM means that you don't work directly with 
IUnknown and IDispatch. Instead, you work with CLR objects and the runtime 
marshals your calls to the COM world via Runtime-Callable Wrappers (RCWs). The 
runtime also handles lifetime management by calling AddRef and Release (when 
the .NET object is finalized) and takes care of the primitive type conversions 
between the two worlds. Type conversion ensures that each side sees, for example, 
the integer and string types in their familiar forms. 


Additionally, there needs to be some way to access RCWs in a statically typed fash- 
ion. This is the job of COM interop types. COM interop types are automatically gen- 
erated proxy types that expose a .NET member for each COM member. The type 
library importer tool (tlbimp.exe) generates COM interop types from the command 
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line, based on a COM library that you choose, and compiles them into a COM 
interop assembly. 


If a COM component implements multiple interfaces, the 
tlbimp.exe tool generates a single type that contains a union of 
members from all interfaces. 


You can create a COM interop assembly in Visual Studio by going to the Add Refer- 
ence dialog box and choosing a library from the COM tab. For example, if you have 
Microsoft Excel installed, adding a reference to the Microsoft Excel Object Library 
allows you to interoperate with Excel’s COM classes. Here's the C# code to create 
and show a workbook, and then populate a cell in that workbook: 


using System; 
using Excel = Microsoft.Office.Interop.Excel; 


class Program 


{ 
static void Main() 
{ 
var excel = new Excel.Application(); 
excel.Visible = true; 
Excel.Workbook workBook = excel.Workbooks.Add(); 
((Excel.Range)excel.Cells[1, 1]).Font.FontStyle = "Bold"; 
((Excel.Range)excel.Cells[1, 1]).Value2 = "Hello World"; 
workBook.SaveAs (@"d:\temp.xlsx"); 
} 
} 


It is currently necessary to embed interop types in your appli- 
cation (otherwise, .NET Core wont locate them at runtime). 
Either click the COM reference in Visual Studio’s Solution 
Explorer and set the Embed Interop Types property to true in 
the Properties window, or open your .csproj file and add the 
following line (in boldface): 


<ItemGroup> 
<COMReference Include="Microsoft.Office.Excel.dll"> 


<EmbedInteropTypes>true</EmbedInteropTypes> 
</COMReference> 
</ItemGroup> 


The Excel.Application class is a COM interop type whose runtime type is an 
RCW. When we access the Workbooks and Cells properties, we get back more 
interop types. 


Optional Parameters and Named Arguments 


Because COM APIs don’t support function overloading, it’s very common to have 
functions with numerous parameters, many of which are optional. For instance, 
here's how you might call an Excel workbook’s Save method: 
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var missing = System.Reflection.Missing. Value; 


workBook.SaveAs (@"d:\temp.xlsx", missing, missing, missing, missing, 
missing, Excel.XlSaveAsAccessMode.xlNoChange, missing, missing, 
missing, missing, missing); 
The good news is that C#’s support for optional parameters is COM-aware, so we 
can just do this: 


workBook.SaveAs (@"d:\temp.xlsx"); 


(As we stated in Chapter 3, optional parameters are “expanded” by the compiler into 
the full verbose form.) 


Named arguments allow you to specify additional arguments, regardless of their 
position: 


workBook.SaveAs (@"c:\test.xlsx", Password:"foo") ; 


Implicit ref Parameters 


Some COM APIs (Microsoft Word, in particular) expose functions that declare 
every parameter as pass-by-reference—whether or not the function modifies the 
parameter value. This is because of the perceived performance gain from not copy- 
ing argument values (the real performance gain is negligible). 


Historically, calling such methods from C# has been clumsy because you must spec- 
ify the ref keyword with every argument, and this prevents the use of optional 
parameters. For instance, to open a Word document, we used to have to do this: 


object filename = "foo.doc"; 

object notUsed1 = Missing.Value; 
object notUsed2 = Missing.Value; 
object notUsed3 = Missing.Value; 


Open (ref filename, ref notUsed1, ref notUsed2, ref notUsed3, ...); 


Thanks to implicit ref parameters, you can omit the ref modifier on COM function 
calls, allowing the use of optional parameters: 


word.Open ("foo.doc"); 


The caveat is that you will get neither a compile-time nor a runtime error if the 
COM method you're calling actually does mutate an argument value. 


Indexers 


The ability to omit the ref modifier has another benefit: it makes COM indexers 
with ref parameters accessible via ordinary C# indexer syntax. This would other- 
wise be forbidden because ref/out parameters are not supported with C# indexers. 


You can also call COM properties that accept arguments. In the following example, 
Foo is a property that accepts an integer argument: 


myComObject.Foo [123] = "Hello"; 
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Writing such properties yourself in C# is still prohibited: a type can expose an 
indexer only on itself (the default indexer). Therefore, if you wanted to write code in 
C# that would make the preceding statement legal, Foo would need to return 
another type that exposed a (default) indexer. 


Dynamic Binding 


Dynamically binding on COM types is currently unavailable 
in .NET Core 3. This functionality was originally scheduled 
for inclusion but then moved to a later release. We've included 
the material here because it’s likely to be available in the next 
major release of .NET Core, and to help you understand and 
update code written for .NET Framework (which does sup- 
port dynamic binding with COM). 


There are two ways that dynamic binding can help when calling COM components. 


The first way is in allowing access to a COM component without a COM interop 
type. To do this, call Type.GetTypeFromProgID with the COM component name to 
obtain a COM instance, and then use dynamic binding to call members from then 
on. Of course, there’s no IntelliSense, and compile-time checks are impossible: 


Type excelAppType = Type.GetTypeFromProgID ("Excel.Application", true); 
dynamic excel = Activator.CreateInstance (excelAppType); 

excel.Visible = true; 

dynamic wb = excel.Workbooks.Add(); 

excel.Cells [1, 1].Value2 = "foo"; 


(The same thing can be achieved, much more clumsily, with reflection instead of 
dynamic binding.) 


A variation of this theme is calling a COM component that 
supports only IDispatch. Such components are quite rare, 
however. 


Dynamic binding can also be useful (to a lesser extent) in dealing with the COM 
variant type. For reasons due more to poor design that necessity, COM API func- 
tions are often peppered with this type, which is roughly equivalent to object 
in .NET. If you enable “Embed Interop Types” in your project (more on this soon), 
the runtime will map variant to dynamic, instead of mapping variant to object, 
avoiding the need for casts. For instance, you could legally do this: 


excel.Cells [1, 1].Font.FontStyle = "Bold"; 
instead of the route you must take in .NET Core 3: 


var range = (Excel.Range) excel.Cells [1, 1]; 

range.Font.FontStyle = "Bold"; 
The disadvantage of working in this way is that you lose autocompletion, so you 
must know that a property called Font happens to exist. For this reason, it’s usually 
easier to dynamically assign the result to its known interop type: 
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Excel.Range range = excel.Cells [1, 1]; 
range.Font.FontStyle = "Bold"; 


As you can see, this saves only five characters over the old-fashioned approach! 


The mapping of variant to dynamic is the default, and is a function of enabling 
Embed Interop Types on a reference. 


Embedding Interop Types 


We said previously that C# ordinarily calls COM components via interop types that 
are generated by calling the tlbimp.exe tool (directly, or via Visual Studio). 


Historically, your only option was to reference interop assemblies just as you would 
with any other assembly. This could be troublesome because interop assemblies can 
get quite large with complex COM components. A tiny add-in for Microsoft Word, 
for instance, requires an interop assembly that is orders of magnitude larger than 
itself. 


Rather than referencing an interop assembly, you have the option of embedding the 
portions that use it. The compiler analyzes the assembly to work out precisely the 
types and members that your application requires, and embeds definitions for (just) 
those types and members directly in your application. This avoids bloat as well as 
the need to ship an additional file. 


To enable this feature, either select the COM reference in Visual Studio’s Solution 
Explorer and then set Embed Interop Types to true in the Properties window, or 
edit your .csproj file as we described earlier (see “Calling a COM Component from 
C#” on page 990). 


Type Equivalence 


The CLR supports type equivalence for linked interop types. This means that if two 
assemblies each link to an interop type, those types will be considered equivalent if 
they wrap the same COM type. This holds true even if the interop assemblies to 
which they linked were generated independently. 


Type equivalence relies on the TypeIdentifierAttribute 
attribute in the System.Runtime.InteropServices name- 
space. The compiler automatically applies this attribute when 
you link to interop assemblies. COM types are then consid- 
ered equivalent if they have the same GUID. 


Exposing C# Objects to COM 


It’s also possible to write classes in C# that can be consumed in the COM world. The 
CLR makes this possible through a proxy called a COM-Callable Wrapper (CCW). 
A CCW marshals types between the two worlds (as with an RCW) and implements 
IUnknown (and optionally IDispatch) as required by the COM protocol. A CCW is 
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lifetime-controlled from the COM side via reference counting (rather than through 
the CLR’s garbage collector). 


You can expose any public class to COM (as an in-proc server). To do so, first create 
an interface, assign it a unique GUID (in Visual Studio, you can use Tools > Create 
GUID), declare it visible to COM, and then set the interface type: 


Namespace MyCom 
{ 
[ComVisible(true) ] 
[Guid ("226E5561-C68E-4B2B-BD28-25103ABCA3B1")] // Change this GUID 
[InterfaceType (ComInterfaceType. InterfaceIsIUnknown) ] 
public interface IServer 
{ 
int Fibonacci(); 
} 
J 


Next, provide an implementation of your interface, assigning a unique GUID to that 
implementation: 


Namespace MyCom 


[ComVisible(true) ] 
[Guid ("09E01FCD-9970-4DB3-B537-0EC555967DD9")] // Change this GUID 
public class Server 
{ 
public ulong Fibonacci (ulong whichTerm) 
{ 
if (whichTerm < 1) throw new ArgumentException ("..."); 
ulong a = 0; 
ulong b = 1; 
for (ulong i = 0; i < whichTerm; i++) 
{ 
ulong tmp = a; 
a=b; 
b = tmp + b; 
} 
return a; 
} 
} 
} 


Edit your .csproj file, adding the following line (in boldface): 


<PropertyGroup> 
<TargetFramework>netcoreapp3.0</TargetFramework> 
<EnableComHosting>true</EnableComHosting> 
</PropertyGroup> 
Now, when you build your project, an additional file is generated, MyCom.com- 
host.dll, which can be registered for COM interop. (Keep in mind that the file will 
always be 32 bit or 64 bit depending on your project configuration: there's no such 
thing as “Any CPU” in this scenario.) From an elevated command prompt, switch to 
the directory holding your DLL and run regsvr32 MyCom.comhost.dll. 
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You can then consume your COM component from most COM-capable languages. 
For example, you can create this Visual Basic Script in a text editor and run it by 
double-clicking the file in Windows Explorer, or by starting it from a command 
prompt as you would a program: 


REM Save file as ComClient.vbs 
Dim obj 
Set obj = CreateObject("MyCom.Server") 


result = obj.Fibonacci(12) 
Wscript.Echo result 


Note that .NET Framework and .NET Core cannot be loaded into the same process. 
Therefore, a NET Core COM server cannot be loaded into a .NET Framework 
COM client process, or vice versa. 


Enabling Registry-Free COM 


Traditionally, COM adds type information to the registry. Registry-free COM uses a 
manifest file instead of the registry to control object activation. To enable this fea- 
ture, add the following line (in boldface) to your .csproj file: 


<PropertyGroup> 
<TargetFramework>netcoreapp3.0</TargetFramework> 
<EnableComHosting>true</EnableComHosting> 
<EnableRegFreeCom>true</EnableRegFreeCom> 
</PropertyGroup> 


Your build will then generate MyCom.X.manifest. 


There is no support in .NET Core 3 for generating a COM 
type library (*.tIb). You can manually write an Interface Defi- 
nition Language (IDL) file or C++ header for the native decla- 
rations in your interface. 
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26 


Regular Expressions 








The regular expressions language identifies character patterns. The .NET types sup- 
porting regular expressions are based on Perl 5 regular expressions and support 
both search and search/replace functionality. 


Regular expressions are used for tasks such as: 


¢ Validating text input such as passwords and phone numbers 
¢ Parsing textual data into more structured forms (e.g., a NuGet version string) 


e Replacing patterns of text in a document (e.g., whole words only) 


This chapter is split into both conceptual sections teaching the basics of regular 
expressions in .NET, and reference sections describing the regular expressions 
language. 


All regular expression types are defined in System. Text .RegularExpressions. 


The samples in this chapter are all preloaded into LINQPad, 
which also includes an interactive RegEx tool (press Ctrl+Shift 
+F1). An online tool is also available. 


Regular Expression Basics 


One of the most common regular expression operators is a quantifier. ? is a quanti- 
fier that matches the preceding item 0 or 1 time. In other words, ? means optional. 
An item is either a single character or a complex structure of characters in square 
brackets. For example, the regular expression "colou?r" matches color and 
colour, but not colouur: 


Console.WriteLine (Regex.Match ("color",  @"colou?r").Success); // True 
Console.WriteLine (Regex.Match ("colour", @"colou?r").Success); // True 
Console.WriteLine (Regex.Match ("colouur", @"colou?r").Success); // False 
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Regex.Match searches within a larger string. The object that it returns has proper- 
ties for the Index and Length of the match as well as the actual Value matched: 


Match m = Regex.Match ("any colour you like", @"colou?r"); 


Console.WriteLine (m.Success); // True 
Console.WriteLine (m. Index); //1 4 
Console.WriteLine (m.Length) ; // 6 
Console.WriteLine (m.Value); // colour 


Console.WriteLine (m.ToString()); // colour 


You can think of Regex.Match as a more powerful version of the string’s IndexOf 
method. The difference is that it searches for a pattern rather than a literal string. 


The IsMatch method is a shortcut for calling Match and then testing the Success 
property. 
The regular expressions engine works from left to right by default, so only the left- 


most match is returned. You can use the NextMatch method to return more 
matches: 


Match m1 = Regex.Match ("One color? There are two colours in my head!", 
@"colou?rs?"); 

Match m2 = m1.NextMatch(); 

Console.WriteLine (m1); // color 

Console.WriteLine (m2); // colours 


The Matches method returns all matches in an array. We can rewrite the preceding 
example as follows: 


foreach (Match m in Regex.Matches 
("One color? There are two colours in my head!", @"colou?rs?")) 
Console.WriteLine (m); 


Another common regular expressions operator is the alternator, expressed with a 
vertical bar, |. An alternator expresses alternatives. The following matches “Jen,” 
“Jenny, and “Jennifer”: 


Console.WriteLine (Regex.IsMatch ("Jenny", "Jen(ny|nifer)?")); // True 


The brackets around an alternator separate the alternatives from the rest of the 
expression. 


You can specify a timeout when matching regular expressions. 
If a match operation takes longer than the specified TimeSpan, 
a RegexMatchTimeoutException is thrown. This can be useful 
if your program processes arbitrary regular expressions (for 
instance, in an advanced search dialog box) because it pre- 
vents malformed regular expressions from infinitely spinning. 
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Compiled Regular Expressions 


In some of the preceding examples, we called a static RegEx method repeatedly with 
the same pattern. An alternative approach in these cases is to instantiate a Regex 
object with the pattern and RegexOptions.Compiled and then call instance 
methods: 


Regex r = new Regex (@"sausages?", RegexOptions.Compiled) ; 
Console.WriteLine (r.Match ("Sausage")); // sausage 
Console.WriteLine (r.Match ("Sausages")); // sausages 


RegexOptions. Compiled instructs the RegEx instance to use lightweight code gener- 
ation (DynamicMethod in Reflection. Emit) to dynamically build and compile code 
tailored to that particular regular expression. This results in faster matching, at the 
expense of an initial compilation cost. 


You can also instantiate a Regex object without using RegexOptions.Compiled. A 
Regex instance is immutable. 


The regular expressions engine is fast. Even without compila- 
tion, a simple match typically takes less than a microsecond. 


RegexOptions 


The RegexOptions flags enum lets you tweak matching behavior. A common use for 
RegexOptions is to perform a case-insensitive search: 


Console.WriteLine (Regex.Match ("a", "A", RegexOptions.IgnoreCase)); // a 


This applies the current culture's rules for case equivalence. The CultureInvariant 
flag lets you request the invariant culture instead: 


Console.WriteLine (Regex.Match ("a", "A", RegexOptions.IgnoreCase 
| RegexOptions.CultureInvariant) ); 


You can activate most of the RegexOptions flags within a regular expression itself, 
using a single-letter code, as follows: 


Console.WriteLine (Regex.Match ("a", @"(?7t)A")); // a 
You can turn options on and off throughout an expression: 
Console.WriteLine (Regex.Match ("AAAa", @"(?i)a(?-i)a")); // Aa 


Another useful option is IgnorePatternWhitespace or (?x). This allows you to 
insert whitespace to make a regular expression more readable—without the white- 
space being taken literally. 


Table 26-1 lists all RegExOptions values along with their single-letter codes. 
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Table 26-1. Regular expression options 








Enum value Regular Description 
expressions 
code 
None 
IgnoreCase i Ignores case (by default, regular expressions are case 
sensitive) 
Multiline m Changes “ and $ so that they match the start/end of a 
line instead of start/end of the string 
ExplicitCapture n Captures only explicitly named or explicitly numbered 
groups (see “Groups” on page 1006) 
Compiled Forces compilation to IL (see “Compiled Regular 
Expressions” on page 999) 
Singleline s Makes . match every character (instead of matching 
every character except \n) 
IgnorePatternWhitespace x Eliminates unescaped whitespace from the pattern 
RightToLeft r Searches from right to left; can’t be specified midstream 
ECMAScript Forces ECMA compliance (by default, the 
implementation is not ECMA compliant) 
CultureInvariant Turns off culture-specific behavior for string comparisons 
Character Escapes 


Regular expressions have the following metacharacters, which have a special rather 
than literal meaning: 


eV +2 /{L O48 .# 


To use a metacharacter literally, you must prefix, or escape, the character with a 
backslash. In the following example, we escape the ? character to match the string 
"what?": 


Console.WriteLine (Regex.Match ("what?", @"what\?")); // what? (correct) 
Console.WriteLine (Regex.Match ("what?", @"what?")); // what (incorrect) 


If the character is inside a set (square brackets), this rule does 
not apply, and the metacharacters are interpreted literally. We 
discuss sets in the following section. 


The Regex’s Escape and Unescape methods convert a string containing regular 
expression metacharacters by replacing them with escaped equivalents, and vice 
versa: 


Console.WriteLine (Regex.Escape (@"?")); // \2 
Console.WriteLine (Regex.Unescape (@"\?")); // ?> 
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All the regular expression strings in this chapter we express with the C# @ literal. 
This is to bypass C#’s escape mechanism, which also uses the backslash. Without the 
@ a literal backslash would require four backslashes: 


Console.WriteLine (Regex.Match ("\\", "\\W\\"))3 // \ 


Unless you include the (?x) option, spaces are treated literally in regular 
expressions: 


Console.Write (Regex.IsMatch ("hello world", @"hello world")); // True 


Character Sets 


Character sets act as wildcards for a particular set of characters: 


Expression Meaning Inverse 
(“not”) 
[abcdef] — Matches a single character in the list. [abc 
def] 
[a-f] Matches a single character in a range. [*a-f] 
\d Matches anything in the Unicode digits category. In ECMAScript mode, [0-9]. \D 
\w Matches a word character (by default, varies according to \w 
CultureInfo.CurrentCulture; for example, in English, same as 
[a-zA-Z_0-9]). 
\s Matches a whitespace character; that is, anything for which \s 


char. IsWhiteSpace returns true (including Unicode spaces). In ECMAScript 
mode, [\n\r\t\f\v ]. 


\p{category} Matches a character in a specified category. \P 
(Default mode) Matches any character except \n. \n 
(SingleLine mode) Matches any character. \n 





To match exactly one of a set of characters, put the character set in square brackets: 
Console.Write (Regex.Matches ("That is that.", "[Tt]hat").Count); // 2 


To match any character except those in a set, put the set in square brackets with a * 
symbol before the first character: 


Console.Write (Regex.Match ("quiz qwerty", "q[*aeiou]"). Index); // 5 


You can specify a range of characters by using a hyphen. The following regular 
expression matches a chess move: 


Console.Write (Regex.Match ("b1-c4", @"[a-h]\d-[a-h]\d").Success); // True 


\d indicates a digit character, so \d will match any digit. \D matches any nondigit 
character. 
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\w indicates a word character, which includes letters, numbers, and the underscore. 
\W matches any nonword character. These work as expected for non-English letters, 
too, such as Cyrillic. 


. matches any character except \n (but allows \r). 


\p matches a character in a specified category, such as {Lu} for uppercase letter or 
{P} for punctuation (we list the categories in the reference section later in the 
chapter): 


Console.Write (Regex.IsMatch ("Yes, please", @"\p{P}")); // True 


We will find more uses for \d, \w, and . when we combine them with quantifiers. 


Quantifiers 


Quantifiers match an item a specified number of times: 


Quantifier Meaning 


* Zero or more matches 
+ One or more matches 
? Zero or one match 
{n} Exactly n matches 
{n,} At least n matches 


{n,m} Between n and m matches 





The * quantifier matches the preceding character or group zero or more times. The 
following matches cv.docx, along with any numbered versions of the same file (e.g., 
cv2.docx, cv15.docx): 


Console.Write (Regex.Match ("cv15.docx", @"cv\d*\.docx").Success); // True 


Notice that we must escape the period in the file extension using a backslash. 


The following allows anything between cv and .docx and is equivalent to dir 
cv* .docx: 


Console.Write (Regex.Match ("cvjoint.docx", @"cv.*\.docx").Success); // True 


The + quantifier matches the preceding character or group one or more times; for 
example: 


Console.Write (Regex.Matches ("slow! yeah slooow!", "slotw").Count); // 2 


The {} quantifier matches a specified number (or range) of repetitions. The follow- 
ing matches a blood pressure reading: 


Regex bp = new Regex (@"\d{2,3}/\d{2,3}"); 
Console.WriteLine (bp.Match ("It used to be 160/110")); // 160/110 
Console.WriteLine (bp.Match ("Now it's only 115/75")); // 115/75 
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Greedy Versus Lazy Quantifiers 


By default, quantifiers are greedy, as opposed to lazy. A greedy quantifier repeats as 
many times as it can before advancing. A lazy quantifier repeats as few times as it 
can before advancing. You can make any quantifier lazy by suffixing it with the ? 
symbol. To illustrate the difference, consider the following HTML fragment: 


string html = "<i>By default</i> quantifiers are <i>greedy</i> creatures"; 


Suppose that we want to extract the two phrases in italics. If we execute the 
following: 


foreach (Match m in Regex.Matches (html, @"<i>.*</i>")) 
Console.WriteLine (m); 


the result is not two matches, but a single match: 
<i>By default</i> quantifiers are <i>greedy</i> 


The problem is that our * quantifier greedily repeats as many times as it can before 
matching </i>. So, it passes right by the first </i>, stopping only at the final </i> 
(the last point at which the rest of the expression can still match). 


If we make the quantifier lazy, the * bails out at the first point at which the rest of 
the expression can match: 


foreach (Match m in Regex.Matches (html, @"<i>.*?</i>")) 
Console.WriteLine (m); 


Here’s the result: 


<i>By default</i> 
<i>greedy</i> 


Zero-Width Assertions 


The regular expressions language lets you place conditions on what should occur 
before or after a match, through lookbehind, lookahead, anchors, and word bound- 
aries. These are called zero-width assertions, because they don’t increase the width 
(or length) of the match itself. 


Lookahead and Lookbehind 


The (?=expr) construct checks whether the text that follows matches expr, without 
including expr in the result. This is called positive lookahead. In the following exam- 
ple, we look for a number followed by the word miles: 


Console.WriteLine (Regex.Match ("say 25 miles more", @"\d+\s(?=miles)")); 


OUTPUT: 25 


Notice the word “miles” was not returned in the result, even though it was required 
to satisfy the match. 
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After a successful lookahead, matching continues as though the sneak preview never 
took place. So, if we append .* to our expression like this: 


Console.WriteLine (Regex.Match ("say 25 miles more", @"\d+\s(?=miles).*")); 
the result is 25 miles more. 


Lookahead can be useful in enforcing rules for a strong password. Suppose that a 
password must be at least six characters and contain at least one digit. With a 
lookup, we could achieve this as follows: 


string password = "..."3 
bool ok = Regex.IsMatch (password, @"(?=.*\d).{6,}"); 


This first performs a lookahead to ensure that a digit occurs somewhere in the 
string. If satisfied, it returns to its position before the sneak preview began and 
matches six or more characters. (In “Cookbook Regular Expressions” on page 1009, 
we include a more substantial password validation example.) 


The opposite is the negative lookahead construct, (?!expr). This requires that the 
match not be followed by expr. The following expression matches “good”—unless 
“however” or “but” appears later in the string: 


string regex = "(?i)good(?!.*(however |but))"; 
Console.WriteLine (Regex.IsMatch ("Good work! But...", regex)); // False 
Console.WriteLine (Regex.IsMatch ("Good work! Thanks!", regex)); // True 


The (?<=expr) construct denotes positive lookbehind and requires that a match be 
preceded by a specified expression. The opposite construct, (?<! expr), denotes neg- 
ative lookbehind and requires that a match not be preceded by a specified expression. 
For example, the following matches “good”—unless “however” appears earlier in the 
string: 

string regex = "(?i)(?<!however.*)good"; 


Console.WriteLine (Regex.IsMatch ("However good, we...", regex)); // False 
Console.WriteLine (Regex.IsMatch ("Very good, thanks!", regex)); // True 


" 


We could improve these examples by adding word boundary assertions, which we 
introduce shortly. 


Anchors 


The anchors * and $ match a particular position. By default: 


A 


Matches the start of the string 


Matches the end of the string 
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* has two context-dependent meanings: an anchor and a char- 
acter class negator. 


$ has two context-dependent meanings: an anchor and a 
replacement group denoter. 


For example: 


Console.WriteLine (Regex.Match ("Not now", "4[Nn]o")); // No 
Console.WriteLine (Regex.Match ("f = 0.2F", "[FF]$")); // F 


When you specify RegexOptions.MuLtiline or include (?m) in the expression: 


e “matches the start of the string or line (directly after a \n). 


e $ matches the end of the string or line (directly before a \n). 


There’s a catch to using $ in multiline mode: a newline in Windows is nearly always 
denoted with \r\n rather than just \n. This means that for $ to be useful for Win- 
dows files, you must usually match the \r, as well, with a positive lookahead: 


(?=\r?$) 


The positive lookahead ensures that \r doesn’t become part of the result. The follow- 
ing matches lines that end in ". txt": 
string fileNames = "a.txt" + "\r\n" + "b.docx" + "\r\n" + "c.txt"; 
string r = Q@".+\.txt(?=\r7$)"; 
foreach (Match m in Regex.Matches (fileNames, r, RegexOptions.Multiline) ) 
Console.Write (m+ " "); 


OUTPUT: a.txt c.txt 
The following matches all empty lines in string s: 


MatchCollection emptyLines = Regex.Matches (s, "4(?=\r?$)", 
RegexOptions.Multiline) ; 


The following matches all lines that are either empty or contain only whitespace: 


MatchCollection blankLines = Regex.Matches (s, "4[ \t]*(?=\r?$)", 
RegexOptions.Multiline) ; 


Because an anchor matches a position rather than a character, 
specifying an anchor on its own matches an empty string: 


Console.WriteLine (Regex.Match ("x", "$").Length); // 9 


Word Boundaries 


The word boundary assertion \b matches where word characters (\w) adjoin either: 
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¢ Nonword characters (\W) 


e The beginning/end of the string (* and $) 


\b is often used to match whole words: 
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foreach (Match m in Regex.Matches ("Wedding in Sarajevo", @"\b\w+\b")) 
Console.WriteLine (m); 


Wedding 
in 
Sarajevo 
The following statements highlight the effect of a word boundary: 


int one = Regex.Matches ("Wedding in Sarajevo", @"\bin\b").Count; // 1 
int two = Regex.Matches ("Wedding in Sarajevo", @"in").Count; // 2 


The next query uses positive lookahead to return words followed by “(sic)”: 


string text = "Don't loose (sic) your cool"; 
Console.Write (Regex.Match (text, @"\b\w+\b\s(?=\(sic\))")); // loose 


Groups 


Sometimes, it’s useful to separate a regular expression into a series of subexpres- 
sions, or groups. For instance, consider the following regular expression that repre- 
sents a US phone number such as 206-465-1918: 


\d{3}-\d{3}-\d{4} 
Suppose that we want to separate this into two groups: area code and local number. 
We can achieve this by using parentheses to capture each group: 
(\d{3})- (\d{3}-\d{4}) 
We then retrieve the groups programmatically: 
Match m = Regex.Match ("206-465-1918", @"(\d{3})-(\d{3}-\d{4})"); 
Console.WriteLine (m.Groups[1]);  // 206 
Console.WriteLine (m.Groups[2]);  // 465-1918 
The zeroth group represents the entire match. In other words, it has the same value 


as the match’s Value: 


Console.WriteLine (m.Groups[0]);  // 206-465-1918 
Console.WriteLine (m); // 206-465-1918 


Groups are part of the regular expressions language itself. This means that you can 
refer to a group within a regular expression. The \n syntax lets you index the group 
by group number n within the expression. For example, the expression (\w)ee\1 
matches deed and peep. In the following example, we find all words in a string start- 
ing and ending in the same letter: 


foreach (Match m in Regex.Matches ("pop pope peep", @"\b(\w)\w+\1\b")) 
Console.Write (m+ " "); // pop peep 


The brackets around the \w instruct the regular expressions engine to store the sub- 
match in a group (in this case, a single letter) so that it can be used later. We refer to 
that group later using \1, meaning the first group in the expression. 
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Named Groups 


In a long or complex expression, it can be easier to work with groups by name 
rather than index. Here's a rewrite of the previous example, using a group that we 
name 'letter': 


string regEx = 


@"\b" + // word boundary 

@"(?'letter'\w)" + // match first letter, and name it 'letter' 
@"\w+" + // match middle letters 

e"\k'letter'" + // match last letter, denoted by 'letter' 
@"\b"; // word boundary 


foreach (Match m in Regex.Matches ("bob pope peep", regEx)) 
Console.Write (m+ " "); // bob peep 


Here's how to name a captured group: 
(?'group-name'group-expr) or (?<group-name>group-expr ) 
And here's how to refer to a group: 
\k'group-name' or \k<group-name> 
The following example matches a simple (non-nested) XML/HTML element by 


looking for start and end nodes with a matching name: 


string regFind = 
@"<(2?'tag'\wt?).*>" + // Lazy-match first tag, and name it 'tag' 
@"(2'text'.*?)" + // lazy-match text content, name it 'text' 
@"</\k'tag'>"; // match last tag, denoted by 'tag' 


Match m = Regex.Match ("<hi>hello</hi>", regFind); 
Console.WriteLine (m.Groups ["tag"]); // ht 
Console.WriteLine (m.Groups ["text"]); // hello 


Allowing for all possible variations in XML structure, such as nested elements, is 
more complex. The .NET regular expressions engine has a sophisticated extension 
called “matched balanced constructs” that can assist with nested tags—information 
on this is available on the internet and in Mastering Regular Expressions (O'Reilly) 
by Jeffrey E. F. Friedl. 


Replacing and Splitting Text 


The RegEx.Replace method works like string.Replace except that it uses a regular 
expression. 


The following replaces “cat” with “dog” Unlike with string.Replace, “catapult” 
won't change into “dogapult” because we match on word boundaries: 
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string find = @"\bcat\b"; 
string replace = "dog"; 
Console.WriteLine (Regex.Replace ("catapult the cat", find, replace)); 


OUTPUT: catapult the dog 





Replacing and Splitting Text | 1007 


The replacement string can reference the original match with the $0 substitution 
construct. The following example wraps numbers within a string in angle brackets: 


string text = "10 plus 20 makes 30"; 
Console.WriteLine (Regex.Replace (text, @"\d+", @"<$0>")); 


OUTPUT: <10> plus <20> makes <30> 


You can access any captured groups with $1, $2, $3, and so on, or ${name} for a 
named group. To illustrate how this can be useful, consider the regular expression in 
the previous section that matched a simple XML element. By rearranging the 
groups, we can form a replacement expression that moves the element’s content into 
an XML attribute: 


string regFind = 
@"<(2?'tag'\wt?).*>" + // Lazy-match first tag, and name it 'tag' 
@"(2'text'.*?)" + // lazy-match text content, name it 'text' 
@"</\k'tag'>"; // match last tag, denoted by 'tag' 


string regReplace = 


e"<${tag}" + // <tag 
@"value=""" + // value=" 
e"S{text}" + // text 
Qn" fo"; I] "[> 


Console.Write (Regex.Replace ("<msg>hello</msg>", regFind, regReplace)); 
Here’s the result: 


<msg value="hello"/> 


MatchEvaluator Delegate 


Replace has an overload that takes a MatchEvaluator delegate, which is invoked per 
match. This allows you to delegate the content of the replacement string to C# code 
when the regular expressions language isn’t expressive enough: 


Console.WriteLine (Regex.Replace ("5 is less than 10", @"\d+", 
m => (int.Parse (m.Value) * 10).ToString()) ); 


OUTPUT: 50 is less than 100 


In “Cookbook Regular Expressions” on page 1009, we show 
how to use a MatchEvaluator to escape Unicode characters 
appropriately for HTML. 


Splitting Text 


The static Regex.Split method is a more powerful version of the string. Split 
method, with a regular expression denoting the separator pattern. In this example, 
we split a string, where any digit counts as a separator: 


foreach (string s in Regex.Split ("a5b7c", @"\d")) 
Console.Write (s +" "); // abc 
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The result, here, doesn’t include the separators themselves. You can include the sep- 
arators, however, by wrapping the expression in a positive lookahead. The following 
splits a camel-case string into separate words: 


foreach (string s in Regex.Split ("oneTwoThree", @"(?=[A-Z])")) 


Console.Write (s +" "); // one Two Three 


Cookbook Regular Expressions 
Recipes 


Matching Social Security number/phone number 
string ssNum = @"\d{3}-\d{2}-\d{4}"; 


Console.WriteLine (Regex.IsMatch ("123-45-6789", ssNum)); // True 


string phone = @"(?x) 
( \d{3}[-\s] | \C\d{3}\)\s? ) 
\d{3}[-\s]? 
\d{4}"5 


Console.WriteLine (Regex.IsMatch ("123-456-7890", phone) ); // True 
Console.WriteLine (Regex.IsMatch ("(123) 456-7890", phone)); // True 
Extracting “name = value” pairs (one per line) 
Note that this starts with the multiline directive (?m): 
string r = @"(?m)\s*(?'name'\w+)\s*=\s*(?'value'.*)\s*(?=\r?$)"; 
string text = 
@"id = 3 


secure = true 
timeout = 30"; 


foreach (Match m in Regex.Matches (text, r)) 
Console.WriteLine (m.Groups["name"] + " is 
id is 3 secure is true timeout is 30 


+ m.Groups["value"]); 


Strong password validation 


The following checks whether a password has at least six characters, and whether it 
contains a digit, symbol, or punctuation mark: 


string r = @"(?x)*(?=.* ( \d | \p{P} | \p{S} )).{6,}"; 


Console.WriteLine (Regex.IsMatch ("abc12", r)); // False 
Console.WriteLine (Regex.IsMatch ("abcdef", r)); // False 
Console.WriteLine (Regex.IsMatch ("ab88yz", r)); // True 


m 
bad 
xo) 
2 
oO 
2) 
2 
fe) 
> 
7) 
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Lines of at least 80 characters 
string r = @"(?m)*.{80,}(2=\r2$)"; 


string fifty = new string ('x', 50); 
string eighty = new string ('x', 80); 


string text = eighty + "\r\n" + fifty + "\r\n" + eighty; 


Console.WriteLine (Regex.Matches (text, r).Count); // 2 


Parsing dates/times (N/N/N H:M:S AM/PM) 


This expression handles a variety of numeric date formats—and works whether the 
year comes first or last. The (?x) directive improves readability by allowing white- 
space; the (?1) switches off case sensitivity (for the optional AM/PM designator). 
You can then access each component of the match through the Groups collection: 


string r = Q@"(?x)(?7i) 

(\d{1,4}) [-/-] 

(\d{1,2}) [./-] 

(\d{1,4}) [\sT] 

(\d+):(\d+):(\d+) \s? (A\.?2M\.?[P\. 2M\.2)2"5 


string text = "01/02/2008 5:20:50 PM"; 


foreach (Group g in Regex.Match (text, r).Groups) 
Console.WriteLine (g.Value + " "); 
01/02/2008 5:20:50 PM 01 02 2008 5 20 50 PM 


(Of course, this doesn't verify that the date/time is correct.) 


Matching Roman numerals 


string fF = 
@" (27) \bm*" 
@"(d?c{O,3}|c[dm])" 
@"(1?x{0,3}]|x[te])" 
@"(v?if{o,3}]iLvx])" 
@"\b"; 


+ +t t+ 


Console.WriteLine (Regex.IsMatch ("MCMLXXXIV", r)); // True 


Removing repeated words 
Here, we capture a named group called dupe: 
string r = Q@"(?'dupe'\w+)\W\k'dupe'"; 


string text = "In the the beginning..."; 
Console.WriteLine (Regex.Replace (text, r, "S{dupe}")); 


In the beginning 
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Word count 
string r = @"\b(\wI[-'])+\b"; 


string text = "It's all mumbo-jumbo to me"; 
Console.WriteLine (Regex.Matches (text, r).Count); // 5 


Matching a GUID 


string r = 
@"(?4)\b" 
@" [0-9a-fA-F]{8}\-" 
@" [0-9a-fA-F]{4}\-" 
@" [0-9a-fA-F]{4}\-" 
@" [0-9a-fA-F]{4}\-" 
@"[0-9a-fA-F]{12}" 
@"\b"; 


++ ee tet 


string text = "Its key is {3F2504E0-4F89-11D3 -9A0C-0305E82C3301}."; 
Console.WriteLine (Regex.Match (text, r).Index); // 12 


Parsing an XML/HTML tag 


Regex is useful for parsing HTML fragments—particularly when the document 
might be imperfectly formed: 


string r = 
@"<(2?'tag'\w+?).*>" + // lazy-match first tag, and name it 'tag' 
@"(?'text'.*?)" + // lazy-match text content, name it 'textd' 
@"</\k'tag'>"; // match last tag, denoted by 'tag' 


string text = "<h1>hello</hi>"; 
Match m = Regex.Match (text, r); 


Console.WriteLine (m.Groups ["tag"]); // ht 
Console.WriteLine (m.Groups ["text"]); // hello 


Splitting a camel-cased word 
This requires a positive lookahead to include the uppercase separators: 
string r = Q@"(?=[A-Z])"; 


foreach (string s in Regex.Split ("oneTwoThree", r)) 
Console.Write (s +" "); // one Two Three 


Obtaining a legal filename 


m 
bad 
xo) 
= 
oO 
wn 
we 
fe) 
| 
wn 


string input = "My \"good\" <recipes>.txt"; 





char[] invalidChars = System.10.Path.GetInvalidFileNameChars(); 
string invalidString = Regex.Escape (new string (invalidChars)); 


string valid = Regex.Replace (input, "[" + invalidString + "J", ""); 
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Console.WriteLine (valid); 


My good recipes. txt 


Escaping Unicode characters for HTML 


string htmlFragment = "© 2007"; 


string result = Regex.Replace (htmlFragment, @"[\u0080-\uFFFF]", 
m => Q"&#" + ((int)m.Value[0]).ToString() + ";"); 


Console.WriteLine (result); // &#169; 2007 


Unescaping characters in an HTTP query string 


string sample = "C%23 rocks"; 


string result = Regex.Replace ( 
sample, 
@"%[0-9a-f][0-9a-f]", 
m => ((char) Convert.ToByte (m.Value.Substring (1), 16)).ToString(), 
RegexOptions.IgnoreCase 


); 


Console.WriteLine (result); // C# rocks 


Parsing Google search terms from a web stats log 


You should use this in conjunction with the previous example to unescape charac- 
ters in the query string: 


string sample = 
"http://google.com/search?hl=en&q=greedy+quantifierst+tregex&btnG=Search" ; 


Match m = Regex.Match (sample, @"(?<=google\..+search\?.*q=).+?(7=(&|$))"); 


string[] keywords = m.Value.Split ( 
new[] { '+' }, StringSplitOptions.RemoveEmptyEntries) ; 


foreach (string keyword in keywords) 
Console.Write (keyword + " "); // greedy quantifiers regex 


Regular Expressions Language Reference 


Table 26-2 through Table 26-12 summarize the regular expressions grammar and 
syntax supported in the .NET implementation. 
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Table 26-2. Character escapes 


Escape code sequence Meaning Hexadecimal equivalent 


\a 
\b 
\t 
\r 
\v 
\f 
\n 
\e 
\nnn 
\xan 
\cl 
\unnnn 


\symbol 


Bell \u0007 
Backspace \u0008 
Tab \u0009 
Carriage return \uO00A 
Vertical tab \u000B 
Form feed \uoo0c 
Newline \u000D 
Escape \u001B 


ASCII character nnn as octal (e.g., \n52) 
ASCII character nn as hex (e.g., \x3F) 

ASCII control character | (e.g., \cG for Ctrl-G) 
Unicode character nnnn as hex (e.g., \uO7DE) 


A nonescaped symbol 





Special case: within a regular expression, \b means word boundary, except ina[ ] 
set, in which \b means the backspace character. 


Table 26-3. Character sets 


Expression 


[abcdef] 


[a-f] 
\d 


\w 


\s 


\p{cate 
gory} 


Meaning 
Matches a single character in the list 


Matches a single character in a range 


Matches a decimal digit 
Same as [0-9] 


Matches a word character (by default, varies according to 
CultureInfo.CurrentCuLture; for example, in English, same as 
[a-zA-Z_0-9]) 


Matches a whitespace character 
Same as [\n\r\t\f\v ] 


Matches a character in a specified category (see Table 26-4) 


(Default mode) Matches any character except \n 


(SingleLine mode) Matches any character 


Inverse 
(“not”) 


[Aabc 
def] 


[*a-f] 
\D 


\wW 


\s 


\P 


m 
bad 
xo) 
2 
oO 
a 
an 
° 
> 
7) 





\n 
\n 
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Table 26-4. Character categories 


Quantifier Meaning 


\p{L} Letters 

\p{Lu} Uppercase letters 
\p{Ll} Lowercase letters 
\p{N} Numbers 

\p{P} Punctuation 
\p{M} Diacritic marks 
\p{S} Symbols 

\p{Z} Separators 
\p{Cc} Control characters 





Table 26-5. Quantifiers 


Quantifier Meaning 


* Zero or more matches 
+ One or more matches 
? Zero or one match 
{n} Exactly n matches 
{n,} At least n matches 


{n,m} Between n and m matches 





The ? suffix can be applied to any of the quantifiers to make them Jazy rather than 
greedy. 


Table 26-6. Substitutions 
$0 Substitutes the matched text 
$group-number Substitutes an indexed group - number within the matched text 


${group-name} Substitutes a text group - name within the matched text 





Substitutions are specified only within a replacement pattern. 
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Table 26-7. Zero-width assertions 


Expression § Meaning 


a Start of string (or line in multiline mode) 
$ End of string (or line in multiline mode) 
\A Start of string (ignores multiline mode) 
\z End of string (ignores multiline mode) 
\Z End of line or string 

\G Where search started 

\b On a word boundary 

\B Not on a word boundary 


(?=expr) — Continue matching only if expression expr matches on right (positive lookahead) 
(?!expr) — Continue matching only if expression expr doesn’t match on right (negative lookahead) 
(?<=expr) Continue matching only if expression expr matches on left (positive lookbehind) 
(?<!expr) Continue matching only if expression expr doesn’t match on left (negative lookbehind) 


(?>expr) — Subexpression expr is matched once and not backtracked 





Table 26-8. Grouping constructs 


(expr) Capture matched expression expr into indexed group 

(?number) Capture matched substring into a specified group number 

(?'name') Capture matched substring into group name 

(?' name1- Undefine name2, and store interval and current group into name; if name2is 
name2' ) undefined, matching backtracks; name is optional 

(?: expr) Noncapturing group 





Table 26-9. Back references 


Parameter syntax Meaning 


\ index Reference a previously captured group by index 


\k<name> Reference a previously captured group by name 





Table 26-10. Alternation 


m 
bad 
ao) 
2 
o 
2) 
an 
° 
S 
7) 


Expression syntax Meaning 





| Logical or 
(?(expr)yes|no) Matches yes if expression matches; otherwise, matches no (no is optional) 


(?(name)yes|no) Matches yes if named group has a match; otherwise, matches no (no is optional) 
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Table 26-11. Miscellaneous constructs 


Expression syntax Meaning 


(?#comment) Inline comment 


#comment Comment to end of line (works only in IgnorePatternWhitespace mode) 





Table 26-12. Regular expression options 


Option Meaning 


(21) — Case-insensitive match (“ignore” case) 

(?m) — Multiline mode; changes * and $ so that they match beginning and end of any line 
(?n) — Captures only explicitly named or numbered groups 

(2c) Compiles to Intermediate Language 

(?s) — Single-line mode; changes meaning of “.” so that it matches every character 

(?x) Eliminates unescaped whitespace from the pattern 


(?r) — Searches from right to left; can’t be specified midstream 
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27 


The Roslyn Compiler 








The C# compiler is itself written in C# and available as a set of modular libraries 
known as Roslyn. By referencing these libraries, you can utilize the compiler’s func- 
tionality in many ways besides compiling source code to an assembly. For example, 
you can write static code analysis and refactoring tools, editors with syntax high- 
lighting and code completion, and Visual Studio plug-ins that understand C# code. 


You can download the Roslyn libraries from NuGet, and there are packages for both 
C# and Visual Basic. Because both languages share some architecture, there are 
common dependencies. The NuGet package ID for the C# compiler libraries is 
Microsoft.CodeAnalysis.CSharp. 


Roslyn’s GitHub site also includes documentation, examples, and walkthroughs that 
demonstrate code analysis and refactoring. 


Roslyn Architecture 


The Roslyn architecture separates compilation into three phases: 


1. Parsing code into syntax trees (the syntactic layer) 

2. Binding identifiers to symbols (the semantic layer) 

3. Emitting Intermediate Language (IL) 
In the first phase, a parser reads C# code and outputs syntax trees. A syntax tree is a 
Document Object Model (DOM) that describes source code in tree structure. 


The second phase is the one in which C#’s static binding takes place. Assembly refer- 
ences are read, and the compiler determines, for instance, that “Console” refers to 
System.Console in System.Console.dll. Overload resolution and type inference are 
a part of this, too. 
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The third phase produces the output assembly. If you plan to use Roslyn for code 
analysis or refactoring, you wont use this functionality. 


Visual Studios editor uses the output of the syntactic layer to color keywords, 
strings, comments, and disabled code (in blue, red, green, and gray, respectively), 
whereas it uses the output of the semantic layer to color resolved type names (in 
turquoise). 


Workspaces 


In this chapter, we describe the compiler and the features it exposes. It's worth keep- 
ing in mind that there are additional “layers” above the compiler, including work- 
spaces and features. 


The workspaces layer is shipped in the Microsoft.CodeAnalysis.CSharp. Workspaces 
NuGet package and provides APIs to work with solutions, projects, and documents. 


The features layer is shipped in Microsoft.CodeAnalysis.CSharp.Features and 
includes numerous APIs for code analysis and refactoring. 


Scripting 
With the Microsoft.CodeAnalysis.CSharp.Scripting NuGet package, you can write 
code such as the following: 

int result = (int) await CSharpScript.EvaluateAsync ("1 + 2"); 


Behind the scenes, the scripting API compiles “1 + 2” into a program that it then 
executes, so it’s less efficient than the solution that we described in Chapter 20 (see 
“Interoperating with Dynamic Languages” on page 863). There are more examples 
on how to use the Roslyn scripting API at https://github.com/dotnet/roslyn/wiki/ 
Scripting-A PI-Samples. 


Syntax Trees 


A syntax tree isa DOM for source code. The syntax tree API is completely separate 
from the System.Linq.Expressions API we discussed in “Expression Trees” on 
page 418 in Chapter 8, although the two have conceptual similarities. Both APIs can 
represent C# expressions in a DOM; however, a Roslyn syntax tree has the following 
unique features: 


¢ It can represent the entire C# language, not just expressions. 


e It can include comments, whitespace, and other “trivia” and can round-trip 
with full fidelity back to the original source code. 


¢ It comes with a ParseText method that parses source code into a syntax tree. 


Conversely, the System. Linq. Expressions API has the following unique features: 
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e It’s built into .NET Core, and the C# compiler itself is programmed to emit 
System.Ling.Expression types when it encounters a lambda expression with 
an assignment conversion to Expression<T>. 


¢ It has a fast and lightweight Compile method that emits a delegate. In contrast, 
the semantic layer that compiles Roslyn syntax trees offers only the heavy- 
weight option of compiling a complete program into an assembly. 


Something that both APIs have in common is that syntax trees are immutable, so 
none of its elements can be altered after it’s created. This means that applications 
such as Visual Studio and LINQPad must create a new syntax tree each time you 
press a key in the editor in order to update syntax highlighting and autocompletion 
services. This is less expensive than it sounds because the new syntax tree is able to 
reuse most of the elements of the old (see “Transforming a Syntax Tree” on page 
1029). And knowing that an object cannot change makes the API simpler to work 
with. It also allows for easier and faster parallelization because multithreaded code 
can safely access all parts of a syntax tree without locks. 


SyntaxTree Structure 
A SyntaxTree comprises three main elements: 


Nodes 
(Abstract SyntaxNode class) Represents C# constructs such as expressions, 
statements, method declarations. Nodes always have at least one child, so a 
node can never be a leaf in the tree. Nodes can have both nodes and tokens as 
children. 


Tokens 
(SyntaxToken struct) Represents the identifiers, keywords, operators, and 
punctuation that make up your source code. The only kind of children that 
tokens can have is optional leading and trailing trivia. A token’s parent is 
always a node. 

Trivia 
(SyntaxTrivia struct) Trivia is for whitespace, comments, preprocessor direc- 
tives, and code that’s inactive due to conditional compilation. Trivia is always 
associated with the token that’s immediately to its left or right, and is exposed 
via that token’s TrailingTrivia and LeadingTrivia properties, respectively. 


Figure 27-1 shows the structure of the following code, with nodes in black, tokens 
in gray, and trivia in white: 


Console.WriteLine ("Hello"); 





SyntaxTrees | 1019 


(@) 
° 
3 
2 
Oo 
© 


uAJsoy OYL 








ExpressionStatement 


InvocationExpression 


StringLiteralExpression 


WhitespaceTrivia 





(trailing trivia) 











Figure 27-1. Syntax trees 


SyntaxNode is abstract and has a C#-specific subclass for each kind of syntactic ele- 
ment, such as VariableDeclarationSyntax or TryStatementSyntax. 


SyntaxToken/SyntaxTrivia are structs, and so a single type represents every kind 
of token/trivia. To distinguish different kinds of token or trivia, you must use the 
RawKind property or Kind extension method (which we explain in the following 
section). 


The best way to explore a syntax tree is with a visualizer. Vis- 
ual Studio has a downloadable visualizer for use with its 
debugger, and LINQPad has one built in. LINQPad displays 
the visualizer automatically for the code in the text editor 
when you click the Tree button in the output window. You can 
also ask LINQPad to display a visualizer for a syntax tree that 
you've created programmatically by calling DumpSyntaxTree 
on the tree (or DumpSyntaxNode on a node). 





Understanding Node Types 


The subclasses of SyntaxNode have been designed to reflect the result of syntactical 
parsing, and are blind to semantic type/symbol information obtained from binding 
that occurs later. For example, consider the result of parsing the following code: 


using System; 
class Foo : SomeBaseClass 


void Test() { Console.WriteLine(); } 
} 
You might expect Console.WriteLine to be represented by a class called Method 
CallExpressionSyntax, but no such class exists. Instead, it’s represented by an 
InvocationExpressionSyntax, under which theres a SimpleMemberAccess 
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Expression. This is because the parser is ignorant of types, so it cannot know that 
Console is a type, and WriteLine is a method. There are many other possibilities: 
Console could be a property of SomeBaseClass, or WriteLine could be an event, field, 
or property of a delegate type. All we can know from the syntax is that we're per- 
forming a member access (identifier.identifier), followed by some kind of invocation 
with zero arguments. 











Common properties and methods 


Nodes, tokens, and trivia have a number of important common properties and 
methods: 


SyntaxTree property 
Returns the syntax tree to which the object belongs. 


Span property 
Returns the object’s position in source code (see “Finding a child by its offset” 
on page 1025). 


Kind extension method 
Returns a SyntaxKind enum that classifies the node, token, or trivia into one of 
several hundred values (e.g., IntKeyword, CommaToken, and Whitespace 
Trivia). The same SyntaxKind enum covers nodes, tokens, and trivia. 


ToString method 
Returns the text (source code) for the node, token, or trivia. For tokens, the 
Text property is equivalent. 


GetDiagnostics method 
Returns errors or warnings generated during parsing. 


IsEquivalentTo method 
Returns true if the object is identical to another node, token, or trivia instance. 
Whitespace differences are significant (to ignore whitespace, call Normalize 
Whitespace before comparing). 


Nodes and tokens also have a FullSpan property and ToFull 
String method. These take into account trivia, whereas Span 
and ToString do not. 


The Kind extension method is a shortcut for casting the RawKind property, which is 
of type int, to Microsoft.CodeAnalysis.CSharp.SyntaxKind. The reason for not 
simply having a Kind property of type SyntaxKind is that the token and trivia types 
are also used in Visual Basic syntax trees, which has a different enum type for 
SyntaxKind. 
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Obtaining a Syntax Tree 


The static ParseText method on CSharpSyntaxTree parses C# code into a Syntax 
Tree: 


SyntaxTree tree = CSharpSyntaxTree.ParseText (@"class Test 


{ 


static void Main() => Console.WriteLine (""Hello""); 


"3 
Console.WriteLine (tree.ToString()); 


tree.DumpSyntaxTree(); // Displays Syntax Tree Visualizer in LINQPad 


To run this in a Visual Studio project, install the Microsoft.CodeAnalysis.CSharp 
NuGet package, and import the following namespaces: 


using Microsoft.CodeAnalysis; 
using Microsoft.CodeAnalysis.CSharp; 


You can optionally pass in a CSharpParseOptions object to specify a C# language 
version, preprocessor symbols, and a DocumentationMode to indicate whether XML 
comments should be parsed (see “Structured trivia” on page 1028). There’s also an 
option to specify a SourceCodeKind. Choosing Script instructs the parser to accept 
a single expression or statement(s) instead of requiring an entire program (sup- 
ported in Roslyn version 2 and later). 


Another way to obtain a syntax tree is to call CSharpSyntaxTree. Create, passing in 
an object graph of nodes and tokens. We describe how to create these objects in 
“Transforming a Syntax Tree” on page 1029. 


After parsing a tree, you can obtain errors and warnings by calling GetDiagnostics. 
(You can also call this method on a specific node or token.) 


If the parse resulted in unexpected errors, the tree's structure 
may not be as you expect. For this reason, it’s worth calling 
GetDiagnostics before proceeding further. 


A nice feature is that a tree with errors will round-trip back to the original text (with 
the same errors). In such cases, the parser does its best to provide a syntax tree that’s 
useful to the semantic layer, creating “phantom nodes” if necessary. This allows tools 
such as code completion to work with incomplete code. (You can determine 
whether a node is phantom by checking the IsMissing property.) 


Calling GetDiagnostics on the syntax tree we created in the last section indicates 
no errors, despite having called Console.WriteLine without importing the System 
namespace. This is a good example of syntactic versus semantic parsing: our pro- 
gram is syntactically correct, and our error will not manifest until we create a com- 
pilation, add assembly references, and query the semantic model, where binding 
takes place. 
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Traversing and Searching a Tree 


A SyntaxTree acts as a wrapper for the tree structure. It has a reference to a single 
root node, which you obtain by calling GetRoot: 


var tree = CSharpSyntaxTree.ParseText (@"class Test 


{ 


static void Main() => Console.WriteLine (""Hello""); 
#")3 
SyntaxNode root = tree.GetRoot(); 


The root node of a C# program is a CompilationUnitSyntax: 


Console.WriteLine (root.GetType().Name); // CompilationUnitSyntax 


Traversing children 


SyntaxNode exposes LINQ-friendly methods to traverse its child nodes and tokens. 
Here are the simplest: 


TEnumerable<SyntaxNode> ChildNodes() 
TEnumerable<SyntaxToken> ChildTokens() 


Following on from our previous example, our root node has a single child node of 
type ClassDeclarationSyntax: 


var cds = (ClassDeclarationSyntax) root.ChildNodes().Single(); 


We can enumerate the members of cds via either its ChildNodes method or the 
Members property of ClassDeclarationSyntax: 


foreach (MemberDeclarationSyntax member in cds.Members) 
Console.WriteLine (member. ToString()); 


with the following result: 
static void Main() => Console.WriteLine (""Hello""); 


There are also Descendant* methods that descend recursively into children. We can 
enumerate the tokens that make up our program as follows: 


foreach (var token in root.DescendantTokens()) 
Console.WriteLine ($"{token.Kind(),-30} {token.Text}"); 


Here’s the result: 


ClassKeyword class 

IdentifierToken Test 

OpenBraceToken { 

StaticKeyword static 

VoidKeyword void 

IdentifierToken Main 

OpenParenToken ( 4 

CloseParenToken ) & = 

EqualsGreaterThanToken => 3n 

IdentifierToken Console 2 9 
es 
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DotToken ‘ 
IdentifierToken WriteLine 


OpenParenToken ( 
StringLiteralToken "Hello" 
CloseParenToken ) 
SemicolonToken r 
CloseBraceToken } 
EndOfFileToken 


Notice that there's no whitespace in the result. Replacing token.Text with 
token. ToFulLString() would give us whitespace (and any other trivia). 


The following uses the DescendantNodes method to locate the syntax node for our 
method declaration: 


var ourMethod = root.DescendantNodes() 
«First (m => m.Kind() == SyntaxKind.MethodDeclaration) ; 


Or, alternatively: 


var ourMethod = root.DescendantNodes() 
. Of Type<MethodDeclarationSyntax>() 
.Single(); 


With the latter example, ourMethod is of type MethodDeclarationSyntax, which 
exposes useful properties specific to method declarations. For instance, if our exam- 
ple contained more than one method definition and we wanted to find just the 
method whose name is “Main,” we could do this: 


var mainMethod = root.DescendantNodes() 
.Of Type<MethodDecLlarationSyntax>() 
.Single (m => m.Identifier.Text == "Main"); 


Identifier is a property on MethodDeclarationSyntax that returns the token cor- 
responding to the method’s identifier (i.e., its name). We could get the same result 
with more effort, as follows: 


root.DescendantNodes().First (m => 
m.Kind() == SyntaxKind.MethodDeclaration && 
m.ChildTokens().Any (t => 
t.Kind() == SyntaxKind.IdentifierToken && t.Text == "Main")); 


SyntaxNode also has GetFirstToken and GetLastToken methods, which are equiva- 
lent to calling DescendantTokens().First() and DescendantTokens().Last(). 


GetLastToken() is faster than DescendantTokens().Last() 
because it returns a direct link rather than enumerating 
through all descendants. 


As nodes can contain both child nodes and tokens whose relative order is signifi- 
cant, there are also methods to enumerate both together: 


ChildSyntaxList ChildNodesAndTokens() 
TEnumerable<SyntaxNodeOrToken> DescendantNodesAndTokens() 
TEnumerable<SyntaxNodeOrToken> DescendantNodesAndTokensAndSelf () 
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(ChildSyntaxList implements IEnumerable<SyntaxNodeOrToken> while also 
exposing a Count property and an indexer to access an element by position.) 


You can traverse trivia directly from a node with the GetLeadingTrivia, Get 
TrailingTrivia, and DescendantTrivia methods. More commonly, though, youd 
access trivia through the token to which it’s attached via the token’s LeadingTrivia 
and TrailingTrivia properties. Or, to convert to text, youd use the ToFul1LString 
method, which includes trivia in the result. 


Traversing parents 
Nodes and tokens have a Parent property of type SyntaxNode. 
For SyntaxTrivia, the “parent” is its token, accessible via the Token property. 


Nodes also have methods that ascend back up the tree; these are prefixed with 
Ancestor. 


Finding a child by its offset 


All nodes, tokens, and trivia have a Span property of type TextSpan to indicate start- 
ing and ending offsets in the source code. Nodes and tokens also have a FullSpan 
property that includes leading and trailing trivia (whereas Span does not). A node's 
Span does, however, include child nodes and tokens. 





Working with TextSpan 


The TextSpan struct has Start, Length, and End integer properties, which indicate 
character offsets in the source code. It also has methods such as Overlap, Overlaps 
With, Intersection, and IntersectswWith. The difference between overlapping and 
intersecting is a matter of one character: two spans overlap if one starts before the 
other ends (<), whereas they intersect if they merely touch (<=). 


The SyntaxTree class exposes a GetLineSpan method that converts a TextSpan into a 
line and character offset. This method ignores the effects of any #line directives 
present in the source code. There's also a GetMappedLineSpan method that takes 
these directives into account. 











You can find a descendant object by position by calling the FindNode, FindToken, 
and FindTrivia methods on SyntaxNode. These methods return the descendant 
object with the smallest span that fully contains the span that you specify. There's 
also a ChildThatContainsPosition method that searches both descendant nodes 
and tokens. 


Should a search result in two nodes with an identical span (typically a child and 
grandchild), the FindNode method will return the outer (parent) node. You can 
change this behavior by passing true to the optional argument getInnermostNode 
ForTie. 
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The Find* methods also have an optional findInsideTrivia bool parameter. If 
true, this also searches for nodes or tokens within structured trivia (see “Trivia” on 
page 1027). 


CSharpSyntaxWalker 


Another way to traverse a tree is by subclassing CSharpSyntaxWalker, overriding 
one or more of its hundreds of virtual methods. This following class counts the 
number of if statements: 


class IfCounter : CSharpSyntaxWalker 


{ 
public int IfCount { get; private set; } 


public override void VisitIfStatement (IfStatementSyntax node) 
{ 


IfCount++; 
// Call the base method if you want to descend into children. 
base.VisitIfStatement (node); 


J 
} 


Here's how to invoke it: 


var ifCounter = new IfCounter (); 
ifCounter.Visit (root); 
Console.WriteLine ($"I found {ifCounter.IfCount} if statements"); 


The result is equivalent to the following: 
root .DescendantNodes().OfType<IfStatementSyntax>().Count() 


Writing a syntax walker can be easier than using the Descendant* methods in more 
complex cases when you need to override multiple methods (in part, because C# has 
no F#-like pattern matching ability). 


By default, CSharpSyntaxWalker visits just nodes. To visit tokens or trivia, you must 
call the base constructor with a SyntaxWalkerDepth, indicating the desired depth 
(node—token—trivia). Then, you can override VisitToken and VisitTrivia: 


class WhiteWalker : CSharpSyntaxWalker // Counts space characters 


{ 
public int SpaceCount { get; private set; } 


public WhiteWalker() : base (SyntaxWalkerDepth.Trivia) { } 


public override void VisitTrivia (SyntaxTrivia trivia) 


{ 


SpaceCount += trivia.ToString().Count (char.IsWhiteSpace) ; 
base.VisitTrivia (trivia); 
} 
} 


If you remove WhiteWalker’s call to the base constructor, VisitTrivia will not fire. 
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Trivia 
Trivia is for code that, after parsing, the compiler can almost entirely ignore in 
terms of producing an output assembly. This comprises whitespace, comments, 


XML documentation, preprocessor directives, and code that’s inactive by virtue of 
conditional compilation. 


The mandatory whitespace in your code is also considered trivia. Although essential 
for parsing, it’s not needed once the syntax tree has been produced (at least by the 
compiler). Trivia is still important for round-tripping back to the original source 
code. 


Trivia belongs to the token to which it’s adjacent. By convention, the parser puts 
whitespace and comments that follow a token, up to the end of the line, into the 
token’s trailing trivia. Anything after that, it treats as leading trivia for the next 
token. (There are exceptions for the very start/end of the file.) If you're creating 
tokens programmatically (see “Transforming a Syntax Tree” on page 1029), you can 
put the whitespace in either place (or not at all, if you’re not going to convert back 
to source code): 


var tree = CSharpSyntaxTree.ParseText (@"class Program 


{ 


static /*comment*/ void Main() {} 


#")s 
SyntaxNode root = tree.GetRoot(); 


// Find the static keyword token: 
var method = root.DescendantTokens().Single (t => 
t.Kind() == SyntaxKind.StaticKeyword) ; 


// Print out the trivia around the static keyword token: 
foreach (SyntaxTrivia t in method.LeadingTrivia) 
Console.WriteLine (new { Kind = "Leading " + t.Kind(), t.Span.Length }); 


foreach (SyntaxTrivia t in method. TrailingTrivia) 
Console.WriteLine (new { Kind = "Trailing " + t.Kind(), t.Span.Length }); 


Here's the output: 


{ Kind = Leading WhitespaceTrivia, Length = 1 } 

{ Kind = Trailing WhitespaceTrivia, Length = 1 } 

{ Kind = Trailing MultiLineCommentTrivia, Length = 11 } 
{ Kind = Trailing WhitespaceTrivia, Length = 1 } 


Preprocessor directives 


It might seem odd that preprocessor directives are considered trivia given that some 
directives (in particular, conditional compilation directives) have a nontrivial effect 
on the output. 


The reason is that preprocessor directives are processed semantically by the parser 
itself; that is, it’s the parser’s job to do the preprocessing. After which, there's 
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nothing left that the compiler need explicitly consider (except for #pragma). To illus- 
trate, let’s examine how the parser handles conditional compilation directives: 


#define FOO 


#if FOO 

Console.WriteLine ("FOO is defined"); 
#else 

Console.WriteLine ("FOO is not defined"); 
#endif 


Upon reading the #if FOO directive, the parser knows that FOO is defined, and so 
the line that follows is parsed normally (as nodes and tokens), whereas the line of 
code following the #e1se directive is parsed into DisabledTextTrivia. 


When calling CSharpSyntaxTree. Parse, you can supply addi- 
tional preprocessor symbols by constructing and passing in a 
CSharpParseOptions instance. 


Hence, with conditional compilation, it is precisely the text that can be ignored that 
ends up in trivia (i.e., the inactive code and the preprocessor directives themselves). 


The #Line directive is handled similarly, in that the parser reads and interprets the 
directive. The information that it harvests is used when you call GetMappedLine 
Span on the syntax tree. 


The #region directive is semantically empty: the only role of the parser is to check 
that #region directives are matched with #endregion directives. The #error and 
#warning directives are also processed by the parser, which generates errors and 
warnings that you can see by calling GetDiagnostics on the tree or node. 


It can still be useful to examine the content of preprocessor directives for purposes 
other than producing the output assembly (syntax highlighting, for instance). This 
is made easier through structured trivia. 


Structured trivia 


There are two kinds of trivia: 


Unstructured trivia 
Comments, whitespace, and code that’s inactive due to conditional compilation 


Structured trivia 
Preprocessor directives and XML documentation 


Unstructured trivia is treated purely as text, whereas structured trivia also has its 
content parsed into a miniature syntax tree. 


The HasStructure property on SyntaxTrivia indicates whether structured trivia is 
present, and the GetStructure method returns the root node for the miniature syn- 
tax tree: 
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var tree = CSharpSyntaxTree.ParseText (@"#define FOO"); 


// In LINQPad: 
tree.DumpSyntaxTree(); // LINQPad displays structured trivia in Visualizer 


SyntaxNode root = tree.GetRoot(); 


var trivia = root.DescendantTrivia().First(); 

Console.WriteLine (trivia.HasStructure) ; // True 

Console.WriteLine (trivia.GetStructure().Kind()); // DefineDirectiveTrivia 
In the case of preprocessor directives, you can navigate directly to the structured 
trivia by calling GetFirstDirective on a SyntaxNode. There’s also a Contains 
Directives property to indicate whether preprocessor trivia is present: 


var tree = CSharpSyntaxTree.ParseText (@"#define FOO"); 
SyntaxNode root = tree.GetRoot(); 


Console.WriteLine (root.ContainsDirectives); // True 


// directive is the root node of the structured trivia: 

var directive = root.GetFirstDirective(); 

Console.WriteLine (directive.Kind()); // DefineDirectiveTrivia 
Console.WriteLine (directive. ToString()); // #define FOO 


// If there were more directives, we could get to them as follows: 
Console.WriteLine (directive.GetNextDirective()); // (null) 


After we have a trivia node, we can cast it to a specific type and query its properties, 
just as we would with any other node: 


var hashDefine = (DefineDirectiveTriviaSyntax) root.GetFirstDirective(); 
Console.WriteLine (hashDefine.Name. Text); // FOO 


All nodes, tokens, and trivia have the IsPartOfStructured 
Trivia property to indicate whether the object in question is 
part of a structured trivia tree (ie., descends from a trivia 
object). 


Transforming a Syntax Tree 


You can “modify” nodes, tokens, and trivia via a set of methods with the following 
prefixes (most of which are extension methods): 


Add* 
Insert* 
Remove* 
RepLace* 
With* 
Without* 


Because syntax trees are immutable, all of these methods return a new object with 
the desired modifications, leaving the original untouched. 
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Handling changes to the source code 


If you're writing a C# editor, for instance, you'll need to update a syntax tree based 
on changes to the source code. The SyntaxTree class has a WithChangedText 
method that does exactly this: it partially reparses the source code based on modifi- 
cations that you describe with a SourceText instance (in Microsoft.CodeAnalysis 
. Text). 


To create a SourceText, use its static From method, giving it the complete source 
code. You then can use this to create a syntax tree: 


SourceText sourceText = SourceText.From ("class Program {}"); 
var tree = CSharpSyntaxTree.ParseText (sourceText); 


Alternatively, you can obtain the SourceText for an existing tree by calling GetText. 


You now can “update” sourceText by calling Replace or WithChanges. For example, 
we could replace the first five characters (class) with struct, as follows: 


var newSource = sourceText.Replace (0, 5, "struct"); 
Finally, we can call WithChangedText on the tree to update it: 


var newlree = tree.WithChangedText (newSource); 
Console.WriteLine (newTree.ToString()); // struct Program {} 


Creating new nodes, tokens, and trivia with SyntaxFactory 


The static methods on SyntaxFactory programmatically create nodes, tokens, and 
trivia, which you can use to “transform” existing syntax trees or to create new trees 
from scratch. 


The most difficult part of doing this is establishing exactly what kind of nodes and 
tokens to create. The solution is to first parse a sample of the code you want, exam- 
ining the result in a syntax visualizer. For instance, suppose that we want to create a 
syntax node for the following: 


using System.Text; 
We can visualize the syntax tree for this in LINQPad, as follows: 
CSharpSyntaxTree.ParseText ("using System.Text;").DumpSyntaxTree(); 


(We can parse using System.Text; without error because it’s valid as a complete 
program, albeit a functionally empty one. For most other code snippets, you'll need 
to wrap the snippet in a method and/or type definition so that it will parse.) 


The result has the following structure, of which we are interested in the second 
node—UsingDirective and its descendants: 


Kind Token Text 


CompilationUnit (node) 
UsingDirective (node) 
UsingKeyword (token) using 
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WhitespaceTrivia (trailing) 
QualifiedName (node) 
IdentifierName (node) 
IdentifierToken (token) System 
DotToken (token) 
IdentifierName (node) 
IdentifierToken (token) Text 
SemiColonToken (token) : 
EndOfFileToken (token) 


Starting from the inside, we have two IdentifierName nodes, whose parent is a 
QualifiedName. We can create that as follows: 


QualifiedNameSyntax qualifiedName = SyntaxFactory.QualifiedName ( 
SyntaxFactory.IdentifierName ("System"), 
SyntaxFactory.IdentifierName ("Text")); 


We used the overload of QualifiedName that accepts two identifiers. This overload 
inserts the dot token for us automatically. 


We now need to wrap this in a UsingDirective: 


UsingDirectiveSyntax usingDirective = 
SyntaxFactory.UsingDirective (qualifiedName) ; 


Because we didn't specify tokens for the using keyword or the trailing semicolon, 
tokens for each were automatically created and added. However, the automatically 
created tokens don't include whitespace. This wouldn't prevent compilation, but 
converting the tree to a string would result in syntactically incorrect code: 


Console.WriteLine (usingDirective.ToFullString()); // usingSystem.Text; 


We can fix this by calling NormalizeWhitespace on the node (or one of its ances- 
tors); doing so automatically adds whitespace trivia (for both syntactic correctness 
and readability). Or for more control, we could add whitespace explicitly: 


usingDirective = usingDirective.WithUsingKeyword ( 


usingDirective.UsingKeyword.WithTrailingTrivia ( 
SyntaxFactory.Whitespace (" "))); 


Console.WriteLine (usingDirective.ToFullString()); // using System.Text; 


For brevity, we “harvested” the node’s existing UsingKeyword to which we added 
trailing trivia. We could have created an equivalent token with more effort by calling 
SyntaxFactory.Token(SyntaxKind.UsingKeyword). 


The final step is to add our UsingDirective node to an existing or new syntax tree 
(or more precisely, the root node of a tree). To do the former, we cast the existing 
trees root to a CompilationUnitSyntax and call the AddUsings method. We then 
can create a new tree from the transformed compilation unit: 


var existingTree = CSharpSyntaxTree.ParseText ("class Program {}"); 
var existingUnit = (CompilationUnitSyntax) existingTree.GetRoot(); 


var unitWithUsing = existingUnit.AddUsings (usingDirective); 
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var treeWithUsing = CSharpSyntaxTree.Create ( 
unitWithUsing.NormalizeWhitespace()); 


Remember that all parts of a syntax tree are immutable. Call- 
ing AddUsings returns a new node, leaving the original 
untouched. Ignoring the return value is an easy mistake to 


make! 


We called NormalizeWhitespace on our compilation unit so that calling ToString 
on the tree will yield syntactically correct and readable code. Alternatively, we could 
have added explicit newline trivia to usingDirective, as follows: 


-WithTrailingTrivia (SyntaxFactory.EndOfLine("\r\n\r\n")) 


Creating a compilation unit and syntax tree from scratch is a similar process. The 
easiest approach is to start with an empty compilation unit and call AddUsings on 
the unit as we did before: 


var unit = SyntaxFactory.CompilationUnit().AddUsings (usingDirective) ; 


We can add type definitions to our compilation unit by creating them in a similar 
fashion, and then calling AddMembers: 


// Create a simple empty class definition: 
unit = unit.AddMembers (SyntaxFactory.ClassDeclaration ("Program") ); 


The final step is to create the tree: 


var tree = CSharpSyntaxTree.Create (unit.NormalizeWhitespace()); 
Console.WriteLine (tree.ToString()); 


// Output: 
using System.Text; 


class Program 
{ 
} 


CSharpSyntaxRewriter 


For more complex syntax tree transformations, you can subclass CSharpSyntax 
Rewriter. 


CSharpSyntaxRewriter is similar to the CSharpSyntaxWalker class that we looked 
at previously (see “CSharpSyntaxWalker” on page 1026) except that each Visit* 
method accepts and returns a syntax node. By returning something other than was 
passed in, you can “rewrite” the syntax tree. 


For instance, the following rewriter changes method declaration names to 
uppercase: 


class MyRewriter : CSharpSyntaxRewriter 


{ 


public override SyntaxNode VisitMethodDeclaration 
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(MethodDeclarationSyntax node) 
{ 
// "Replace" the method's identifier with an uppercase version: 
return node.WithIdentifier ( 
SyntaxFactory.Identifier ( 


node. Identifier.LeadingTrivia, // Preserve old trivia 
node. Identifier.Text.ToUpperInvariant(), 
node. Identifier. TrailingTrivia) ); // Preserve old trivia 


i 
} 


Here’s how to use it: 


var tree = CSharpSyntaxTree.ParseText (@"class Program 


{ 
static void Main() { Test(); } 
static void Test() { } 
#")3 


var rewriter = new MyRewriter(); 
var newRoot = rewriter.Visit (tree.GetRoot()); 
Console.WriteLine (newRoot.ToFullString()); 


// Output: 
class Program 
{ 
static void MAIN() { Test(); } 
static void TEST() { } 
} 


Notice that our call to Test() in the main method did not get renamed, because we 
visited just member declarations and ignored invocations. To reliably rename invo- 
cations, however, we must be able to determine whether calls to Main() or Test() 
refer to the Program type, and not some other type. To do this, a syntax tree is not 
enough on its own; we also need a semantic model. 


Compilations and Semantic Models 


A compilation comprises syntax trees, references, and compilation options. It serves 
two purposes: 

¢ Allows compilation to a library or executable (the emit phase). 

e Exposes a semantic model that provides symbol information (obtained from 


binding). 


The semantic model is essential in implementing features such as symbol renaming, 
or offering code completion listings in an editor. 
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Creating a Compilation 


Whether you're interested in querying the semantic model or performing a full 
compilation, the first step is to create a CSharpCompilation, passing in the (simple) 
name of the assembly that you want to create: 


var compilation = CSharpCompilation.Create ("test"); 


An assembly’s simple name is important even if you don’t plan to emit an assembly, 
because it forms part of the identity of the types inside the compilation. 


By default, it assumes that you want to create a library. You can specify a different 
kind of output (windows executable, console executable, etc.) as follows: 


compilation = compilation.WithOptions ( 
new CSharpCompilationOptions (OutputKind.ConsoleApplication) ); 


The CSharpCompilationOptions class has more than a dozen optional constructor 
parameters for options that you can pass to the compiler. For example, to enable 
compiler optimizations, you would do this: 


compilation = compilation.WithOptions ( 
new CSharpCompilationOptions (OutputKind.ConsoleApplication, 
optimizationLevel:0OptimizationLevel.Release) ) ; 


Next, let’s add syntax trees. Each syntax tree corresponds to a “file” to be included in 
the compilation: 


var tree = CSharpSyntaxTree.ParseText (@"class Program 


{ 


static void Main() => System.Console.WriteLine (""Hello""); 


}"); 
compilation = compilation.AddSyntaxTrees (tree); 


Finally, we need to reference the .NET Core assemblies. Because it’s difficult to 
know exactly what combination of assemblies are required, it’s easiest to reference 
them all. The following code returns all the .NET Core assemblies (plus any that the 
calling application references): 


string trustedAssemblies = (string)AppContext.GetData 
("TRUSTED_PLATFORM_ASSEMBLIES"); 
string[] trustedAssemblyPaths = trustedAssemblies.Split(Path.PathSeparator) ; 


Note that this returns runtime assemblies, which are specific to 
the current platform and .NET Core version. If youre plan- 
ning to use Roslyn to compile libraries that will work correctly 
across different platforms and .NET Core versions, you should 
use reference assemblies instead. The reference assemblies are 
available in the NuGet package Microsoft.NETCore.app.ref 
(for .NET Core), Microsoft.AspNetCore.App.ref (for ASP.NET 
Core), and Microsoft.WindowsDesktop.app.ref (for Windows 
Forms/WPE). 


We then can add the references to the compilation, as follows: 
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var references = trustedAssemblyPaths.Select 
(path => MetadataReference.CreateFromFile (path)); 


compilation = compilation.AddReferences (references); 


The call to MetadataReference.CreateFromFile reads the content of an assembly 
into memory, but not using ordinary reflection. Instead, it uses a high-performance 
assembly reader (System.Reflection.Metadata), which avoids creating an Assembly 
object. (Creating an Assembly object would be slow and result in the assembly file 
being locked until the process exited.) 


The PortableExecutableReference that you get back from 
MetadataReference.CreateFromFile can end up with a sig- 
nificant memory footprint, so be careful about holding on to 
references that you don't need. Also, if you find yourself 
repeatedly creating references to the same assembly, a cache is 
worth considering (one that holds weak references is ideal). 


You can do everything in a single step by calling the overload of CSharp 
Compilation.Create that takes syntax trees, references, and options. Or you can do 
it fluently in a single expression, too: 


var compilation = CSharpCompilation.Create ("...") 
.WithOptions (...) 
.AddSyntaxTrees (...) 
.AddReferences (...)3 


Diagnostics 


A compilation can generate errors and warnings even if the syntax trees are error 
free. Examples include forgetting to import a namespace, a typo when referring to a 
type or member name, and type parameter inference failing. You can get the errors 
and warnings by calling GetDiagnostics on the compilation object. Any syntax 
errors will be included, too. 


Emitting an Assembly 
Creating an output assembly is simply a matter of calling Emit: 


EmitResult result = compilation.Emit (@"c:\temp\test.dll"); 
Console.WriteLine (result.Success); 


If result. Success is false, EmitResult also has a Diagnostics property to indicate 
the errors that occurred during emission (this also includes diagnostics from the 
previous stages). If Emit fails due to a file I/O error, it will throw an exception rather 
than generate error codes. 


With .NET Core, you must specify a .dll extension even for Console or Windows 
applications. To run the application, you then call dotnet.exe with the path to 
your .dil. 
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The Emit method also lets you specify a .pdb file path (for debug information), and 
an XML documentation file path. 


Querying the Semantic Model 


Calling GetSemanticModel on a compilation returns the semantic model for a syntax 
tree: 


var tree = CSharpSyntaxTree.ParseText (@"class Program 


{ 


static void Main() => System.Console.WriteLine (123); 


$")5 


var references = ((string)AppContext.GetData('"TRUSTED_PLATFORM_ASSEMBLIES") ) 
.Split (Path.PathSeparator) 
.Select (path => MetadataReference.CreateFromFile (path)); 


var compilation = CSharpCompilation.Create ("test") 
.AddReferences (references) 
.AddSyntaxTrees (tree); 


SemanticModel model = compilation.GetSemanticModel (tree); 


(The reason for needing to specify a tree is that a compilation can contain multiple 
trees.) 


You might expect a semantic model to be similar to a syntax tree, but with more 
properties and methods and a more detailed structure. This is not the case and there 
is no overarching DOM associated with the semantic model. Instead, you're given a 
set of methods to call to obtain semantic information about a particular position or 
node in the syntax tree. 


This means that you can’t “explore” a semantic model like you would a syntax tree, 
and using it is rather like playing “20 Questions”: the challenge is figuring out the 
right questions to ask. There are nearly 50 methods and extension methods; in this 
section, we'll cover some of the most commonly used methods, in particular, those 
that demonstrate the principles of using the semantic model. 


Following on from our previous example, we could ask for symbol information on 
the WriteLine identifier, as follows: 


var writeLineNode = tree.GetRoot().DescendantTokens().Single ( 
t => t.Text == "WriteLine").Parent; 


SymbolInfo symbolInfo = model.GetSymbolInfo (writeLineNode) ; 
Console.WriteLine (symbolInfo.Symbol); // System.Console.WriteLine(int) 


SymbolInfo is a wrapper for symbols, whose nuances we discuss shortly. We begin 
first with symbols. 
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Symbols 


In the syntax tree, names such as System, Console, and WriteLine are parsed as 
identifiers (IdentifierNameSyntax node). Identifiers have little meaning, and the 
syntactic parser does no work on “understanding” them other than to distinguish 
them from contextual keywords. 


The semantic model is able to transform identifiers into symbols, which have type 
information (the output of the binding phase). 


All symbols implement the ISymbol interface, although there are more specific 
interfaces for each kind of symbol. In our example, System, Console, and WriteLine 
map to symbols of the following types: 


System INamespaceSymbol 
Console INamedTypeSymbol 
WriteLine IMethodSymbol 


Some symbol types, such as IMethodSymbol, have a conceptual analog in the 
System.Reflection namespace (MethodInfo, in this case), whereas some other 
symbol types, such as INamespaceSymbol, do not. This is because the Roslyn type 
system exists for the benefit of the compiler, whereas the Reflection type system 
exists for the benefit of the CLR (after the source code has melted away). 


Nonetheless, working with ISymbol types is similar in many ways to using the 
Reflection API we described in Chapter 19. Let’s extend our previous example: 


ISymbol symbol = model.GetSymbolInfo (writeLineNode).Symbol; 


Console.WriteLine (symbol.Name) ; // WriteLine 
Console.WriteLine (symbol.Kind); // Method 
Console.WriteLine (symbol.IsStatic); // True 
Console.WriteLine (symbol.ContainingType.Name) ; // Console 


var method = (IMethodSymbol) symbol; 
Console.WriteLine (method.ReturnType.ToString()); // void 


The output of the last line illustrates a subtle difference with Reflection. Notice that 
void is in lowercase, which is C# nomenclature (Reflection is language-agnostic). 
Similarly, calling ToString() on the INamedTypeSymbol for System. Int32 returns 
int. Here’s something else you can't do with Reflection: 


Console.WriteLine (symbol.Language) ; // # 


With the syntax trees API, the classes for syntax nodes differ 
for C# and Visual Basic (although they share an abstract 
SyntaxNode base type). This makes sense because the lan- 
guages have a different lexical structure. In contrast, ISymbol 
and its derived interfaces are shared between C# and Visual 
Basic. However, their internal concrete implementations are 
specific to each language, and the output from their methods 
and properties reflects language-specific differences. 
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We can also ask the symbol where it came from: 


var Location = symbol.Locations.First(); 
Console.WriteLine (Location.Kind); // MetadataFile 


If the symbol was defined in our own source code (i.e., a syntax tree), the Source 
Tree property will return that tree, and SourceSpan will return its location in the 
tree: 


Console.WriteLine (lLocation.SourceTree == null); // True 
Console.WriteLine (Location.SourceSpan) ; // [@..0) 
A partial type can have multiple definitions, in which case it will have multiple 
Locations. 


The following query returns all the overloads of WriteLine: 
symbol.ContainingType.GetMembers ("WriteLine") .OfType<IMethodSymbol>() 


You can also call ToDisplayParts on a symbol. This returns a collection of parts 
that make up the full name; in our case System.Console.WriteLine(int) compri- 
ses four symbols interspersed with punctuation. 


Symbollnfo 


If you're writing code completion for an editor, you'll need to obtain symbols for 
code that’s incomplete or incorrect. For instance, consider the following incomplete 
code: 


System.Console.Writeline( 


Because the WriteLine method is overloaded, it’s impossible to match to a single 
ISymbol. Instead, we want to present options to the user. To deal with this, the 
semantic model’s GetSymbolInfo method returns an ISymbolInfo struct, which has 
the following properties: 


ISymbol Symbol 
ImmutableArray<ISymbol> CandidateSymbols 
CandidateReason CandidateReason 


If there's an error or ambiguity, the Symbol property returns null, and Candidate 
Symbols returns a collection comprising the best matches. The CandidateReason 
property returns an enum telling you what went wrong. 


To obtain error and warning information for a section of code, 
you can also call GetDiagnostics on a semantic model, speci- 
fying a TextSpan. Calling GetDiagnostics with no argument 
is equivalent to calling the same method on the CSharp 

Compilation object. 


Symbol accessibility 


ISymbol has a DeclaredAccessibility property that indicates whether the symbol 
is public, protected, internal, and so on. However, this isn’t sufficient to determine 





1038 | Chapter 27: The Roslyn Compiler 


whether a given symbol is accessible at a particular position in your source code. 
Local variables, for instance, have a lexically limited scope, and a protected class 
member is accessible from source code positions within its type or a derived type. 
To help with this, SenanticModel has an IsAccessible method: 


bool canAccess = model.IsAccessible (42, someSymbol); 


This returns true if someSymbol can be accessed at offset 42 in the source code. 


Declared symbols 
If you call GetSymbolInfo on a type or member declaration, you'll get no symbols 


back. For instance, suppose that we want the symbol for our Main method: 


var mainMethod = tree.GetRoot().DescendantTokens().Single ( 
t => t.Text == "Main").Parent; 


SymbolInfo symbolInfo = model.GetSymbolInfo (mainMethod) ; 
Console.WriteLine (symbolInfo.Symbol == null); // True 
Console.WriteLine (symbolInfo.CandidateSymbols.Length) ; // 9 


This applies not just to type/member declarations, but any 
node where you're introducing a new symbol rather than con- 
suming an existing symbol. 


To obtain the symbol, we must instead call GetDeclaredSymbol: 


ISymbol symbol = model.GetDeclaredSymbol (mainMethod) ; 


Unlike GetSymbolInfo, GetDeclaredSymbol either succeeds or it doesnt. (If it fails, 
it will be because it can't find a valid declaration node.) 


To give another example, suppose that our Main method is as follows: 


static void Main() 


{ 
tnt xyz = 123; 


} 


We can determine the type of xyz as follows: 


SyntaxNode variableDecl = tree.GetRoot().DescendantTokens().Single ( 
t => t.Text == "xyz").Parent; 


var local = (ILocalSymbol) model.GetDeclaredSymbol (variableDecl); 
Console.WriteLine (local.Type.ToString()); // int 
Console.WriteLine (local. Type.BaseType.ToString()); // System.ValueType 


Typelnfo 


Sometimes, you need type information about an expression or literal for which 
there's no explicit symbol. Consider the following: 


var now = System.DateTime.Now; 
System.Console.WriteLine (now - now); 
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To determine the type of now - now, we call GetTypeInfo on the semantic model: 


SyntaxNode binaryExpr = tree.GetRoot().DescendantTokens().Single ( 
t => t.Text == "-").Parent; 


TypeInfo typeInfo = model.GetTypeInfo (binaryExpr); 


TypeiInfo has two properties, Type and ConvertedType. The latter indicates the type 
after any implicit conversions: 


Console.WriteLine (typeInfo.Type) ; // System.TimeSpan 
Console.WriteLine (typeInfo.ConvertedType) ; // object 


Because Console.WriteLine is overloaded to accept an object but not a TimeSpan, 
an implicit conversion to object took place, which manifested in typeInfo 
.ConvertedType. 


Looking up symbols 


A powerful feature of the semantic model is the ability to ask for all symbols in 
scope at a particular point in the source code. The result is the basis for IntelliSense 
listings, when the user requests a list of available symbols. 


To obtain the listing, simply call LookupSymbols, with the desired source code offset. 
Here's a complete example: 


var tree = CSharpSyntaxTree.ParseText (@"class Program 


{ 
static void Main() 
{ 
int x = 123, y = 234; 
} 
#")s 


var references = ((string)AppContext.GetData ("TRUSTED_PLATFORM_ASSEMBLIES") ) 
.Split (Path.PathSeparator) 
.Select (path => MetadataReference.CreateFromFile (path)); 


var compilation = CSharpCompilation.Create ("test") 
.AddReferences (references) 
.AddSyntaxTrees (tree); 


SemanticModel model = compilation.GetSemanticModel (tree); 


// Look for available symbols at start of 6th line: 
int index = tree.GetText().Lines[5].Start; 


foreach (ISymbol symbol in model.LookupSymbols (index) ) 
Console.WriteLine (symbol. ToString()); 


Here’s the result: 


y 
x 
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Program.Main() 

object. ToString() 
object.Equals(object) 
object.Equals(object, object) 
object.ReferenceEquals(object, object) 
object .GetHashCode() 
object.GetType() 
object.~Object() 
object.MemberwiseCLlone() 
Program 

Microsoft 

System 

Windows 


(If we imported the System namespace, wed see hundreds more symbols, for types 
in that namespace.) 


Example: Renaming a Symbol 


To illustrate the features we've covered, let’s write a method to rename a symbol, 
which is robust to the most common use cases; in particular: 


¢ The symbol can be a type, member, local variable, range, or loop variable. 
¢ You can specify the symbol from either its use or declaration. 
¢ With a class or struct, it will rename the static and instance constructors. 


e In the case of a class, it will rename the finalizer (destructor). 


For brevity, we omit some checks, such as ensuring that the new name is not already 
in use, and that the symbol isn’t an edge-case for which the rename will fail. Our 
method will consider just a single syntax tree, and so will have the following 
signature: 


public SyntaxTree RenameSymbol (SemanticModel model, SyntaxToken token, 
string newName) 


One obvious way to implement this is to subclass CSharpSyntaxRewriter. However, 
a more elegant and flexible approach is to have RenameSymbol call a lower-level 
method that returns the text spans to be renamed: 


public IEnumerable<TextSpan> GetRenameSpans (SemanticModel model, 
SyntaxToken token) 


This allows an editor to call GetRenameSpans directly and apply just the changes 
(within an Undo transaction), avoiding the loss of editor state that might otherwise 
result in replacing the entire text. 


This makes RenameSymbol a relatively simple wrapper around GetRenameSpans. We 
can use SourceText’s WithChanges method to apply a sequence of text changes: 


public SyntaxTree RenameSymbol (SemanticModel model, SyntaxToken token, 
string newName) 


{ 
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TEnumerable<TextSpan> renameSpans = GetRenameSpans (model, token); 


SourceText newSourceText = model.SyntaxTree.GetText().WithChanges ( 
renameSpans.Select (span => new TextChange (span, newName)) 
.OrderBy (tc => tc)); 


return model.SyntaxTree.WithChangedText (newSourceText); 


} 


WithChanges throws an exception unless the changes are in order; this is why we 
called OrderBy on the latter. 


Now we must write GetRenameSpans. The first step is to find the symbol corre- 
sponding to the token that we want to rename. The token can be part of either a 
declaration or usage, so we first call GetSymbolInfo, and if the result is null, we call 
GetDeclaredSymbol: 


public IEnumerable<TextSpan> GetRenameSpans (SemanticModel model, 
SyntaxToken token) 
{ 


var node = token.Parent; 


ISymbol symbol = model.GetSymbolInfo (node) .Symbol 
?? model.GetDeclaredSymbol (node); 


if (symbol == null) return null; // No symbol to rename. 


Next, we need to find the symbol definitions. We can get this from the symbol’s 
Locations property. (Our consideration of multiple locations makes us robust to 
the scenario of partial classes and methods, although for the former to be useful, we 
would need to expand the example to work with multiple syntax trees): 


var definitions = 
from location in symbol.Locations 
where Location.SourceTree == node.SyntaxTree 
select location. SourceSpan; 


Now we need to find usages of the symbol. For this, we begin by looking for 
descendant tokens whose names match the symbol’s name because this is a fast way 
to weed out most tokens. Then, we can call GetSymbolInfo on the token’s parent 
node and see whether it matches the symbol we want to rename: 


var usages = 
from t in model.SyntaxTree.GetRoot().DescendantTokens() 
where t.Text == symbol.Name 
let s = model.GetSymbolInfo (t.Parent).Symbol 
where s == symbol 
select t.Span; 


Binding-related operations such as asking for symbol infor- 
mation have a tendency to be slower than operations that con- 
sider just text or syntax trees. This is because the process of 
binding can require searching for types in assemblies, apply- 
ing type inference rules, and checking for extensions methods. 
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If the symbol is something other than a named type (local variable, range variable, 
etc.), our job is done and we can return the definitions plus usages: 


if (symbol.Kind != SymbolKind.NamedType) 
return definitions.Concat (usages); 


If the symbol is a named type, we need to rename its constructors and destructor, if 
present. To do so, we enumerate the descendant nodes, looking for type declarations 
whose names match the one we want to rename. Then, we get its declared symbol, 
and if it matches the one we're renaming, we locate its constructor and destructor 
methods, returning the spans of their identifiers if present: 


var structors = 
from type in model.SyntaxTree.GetRoot().DescendantNodes() 
.Of Type<TypeDecLarationSyntax>() 
where type.Identifier.Text == symbol.Name 
let declaredSymbol = model.GetDeclaredSymbol (type) 
where declaredSymbol == symbol 
from method in type.Members 
let constructor = method as ConstructorDeclarationSyntax 
let destructor = method as DestructorDeclarationSyntax 
where constructor != null || destructor != null 
let identifier = constructor?.Identifier ?? destructor.Identifier 
select identifier .Span; 


return definitions.Concat (usages).Concat (structors); 


} 


Here's the complete listing, along with an example of how to use it: 


void Demo() 
{ 

var tree = CSharpSyntaxTree.ParseText (@"class Program 
{ 

static Program() {} 

public Program() {} 


static void Main() 

{ 
Program p = new Program(); 
p.Foo(); 

} 


void Foo() => Bar(); 
void Bar() => Foo(); 
} 
")3 


var references = ((string)AppContext.GetData 
("TRUSTED_PLATFORM_ASSEMBLIES" ) ) 
-Split (Path.PathSeparator) 
.Select (path => MetadataReference.CreateFromFile (path)); 


var compilation = CSharpCompilation.Create ("test") 
.AddReferences (references) 
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} 


.AddSyntaxTrees (tree); 
var model = compilation.GetSemanticModel (tree); 
var tokens = tree.GetRoot().DescendantTokens(); 


// Rename the Program class to Program2: 
SyntaxToken program = tokens.First (t => t.Text == "Program"); 
Console.WriteLine (RenameSymbol (model, program, "Program2").ToString()); 


// Rename the Foo method to Foo2: 
SyntaxToken foo = tokens.Last (t => t.Text == "Foo"); 
Console.WriteLine (RenameSymbol (model, foo, "Foo2").ToString()); 


// Rename the p local variable to p2: 
SyntaxToken p = tokens.Last (t => t.Text == "p"); 
Console.WriteLine (RenameSymbol (model, p, "p2").ToString()); 


public SyntaxTree RenameSymbol (SemanticModel model, SyntaxToken token, 


{ 


} 


string newName) 


TEnumerable<TextSpan> renameSpans = 
GetRenameSpans (model, token).OrderBy (s => s); 


SourceText newSourceText = model.SyntaxTree.GetText().WithChanges ( 
renameSpans.Select (s => new TextChange (s, newName))); 


return model.SyntaxTree.WithChangedText (newSourceText) ; 


public IEnumerable<TextSpan> GetRenameSpans (SemanticModel model, 


{ 


SyntaxToken token) 
var node = token.Parent; 


ISymbol symbol = 
model.GetSymbolInfo (node).Symbol ?? 
model.GetDeclaredSymbol (node) ; 


if (symbol == null) return null; // No symbol to rename. 


var definitions = 
from location in symbol.Locations 
where Location.SourceTree == node.SyntaxTree 
select location. SourceSpan; 


var usages = 
from t in model.SyntaxTree.GetRoot().DescendantTokens () 
where t.Text == symbol.Name 
let s = model.GetSymbolInfo (t.Parent).Symbol 
where s == symbol 
select t.Span; 


if (symbol.Kind != SymbolKind.NamedType) 
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return definitions.Concat (usages); 


var structors = 
from type in model.SyntaxTree.GetRoot().DescendantNodes() 
.Of Type<TypeDeclarationSyntax>() 
where type.Identifier.Text == symbol.Name 
let declaredSymbol = model.GetDeclaredSymbol (type) 
where declaredSymbol == symbol 
from method in type.Members 
let constructor = method as ConstructorDeclarationSyntax 
let destructor = method as DestructorDeclarationSyntax 
where constructor != null || destructor != null 
let identifier = constructor?.Identifier ?? destructor.Identifier 
select identifier .Span; 


return definitions.Concat (usages).Concat (structors); 


} 
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Symbols 


! (logical negation operator), 41, 552 
! (null-forgiving operator), 191 
!= (inequality operator), 44, 187, 297, 304 
# (hash symbol), 551 
# (hash), preceding preprocessor direc- 
tives, 224 
$ (dollar sign) 
preceding interpolated strings, 47, 248 
in regular expressions, 1004 
% (remainder operator), 39 
& (ampersand) 
address-of operator, 220 
bitwise AND operator, 41, 132 
in parameter type names, 801 
&& (conditional and operator), 44, 552 
" (quote, single) 
enclosing char literals, 45 
following generic type names, 800 
() (parentheses), 26, 65 
(¢x) IgnorePatternWhitespace, 999 
* (asterisk) 
as deference operator, 220 
as multiplication operator, 23, 26, 39 
in regular expressions, 1002 
+ (plus sign) 
addition operator, 39 
combining delegate instances, 151 
in nested type names, 800 
in regular expressions, 1002 
string concatenation operator, 47 
++ (increment operator), 39 
+= (delegate variable assignment), 151 
+= operator 
combining delegate instances, 151 
event accessors, 163 


Index 


subscribing to events, 158 
+co (positive infinity), 41 
, (comma), 51 
- (hyphen) 
in regular expressions, 1001 
- (minus sign) 
negative infinity (—°0), 41 
negative zero (—0), 41 
removing delegate instances, 151 
subtraction operator, 39 
-- (decrement operator), 39 
-= operator 
event accessors, 163 
removing delegate instances, 151 
unsubscribing from events, 158 
-> (pointer-to-member operator), 220 
. (period), 26, 65 
/ (forward slash) 
as division operator, 39 
trailing in URIs, 692 
/* */ (multiline comments), 27, 226 
// (forward slash, double), 27 
/// (documentation comments), 226 
: (colon), in named arguments, 62 
:: (namespace alias qualification), 87 
; (semicolon), 22, 26 
< (comparison operator), 307 
<< (shift left operator), 41 
= (equal sign), as assignment operator, 27, 
65 
== (equality operator), 27, 44, 187, 249, 
297, 304 
Equals method versus, 301 
string equality comparison, 250 
=> (expression-bodied members), 92, 94, 
100, 105, 217 
=> (fat arrow notation), 92, 100 
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=> (lambda operator), 165 
> (comparison operator), 307 
>> (shift right operator), 41 
? (question mark) 
in nullable types, 185 
in regular expressions, 997, 1003 
¢. (null-conditional operator), 69, 161, 189 
2? (null coalescing operator), 69, 189 
[] (square brackets) 
array declaration, 23, 48, 51 
in regular expressions, 1000 
\ (backslash) 
preceding escape sequences, 45 
in regular expressions, 1000 
\b (word boundary assertion), 1005 
‘ (caret) 
bitwise exclusive OR operator, 41 
in regular expressions, 1001, 1004 
_ (discard symbol), 59, 76 
{} (braces) 
enclosing expressions in interpolated 
strings, 248 
enclosing statement blocks, 22, 26, 70 
in if statements, 73 
in regular expressions, 1002 
| (vertical bar) 
bitwise OR operator, 41, 132 
in regular expressions, 998 
|| (conditional or operator), 44, 552 
~ (complement operator), 41 
~ (finalizer), 104, 531 


A 


abstract classes, 111 

abstract members, 111 

access control lists (ACLs), 682 

access modifiers, 123-124 
accessibility capping, 124 
friend assemblies, 124 
restrictions on, 124 

accessors, 99, 163 

ACLs (access control lists), 682 

Action delegate, 154 

addressing systems, network, 689 

administrative elevation, 682 

Aes class, 871-875 

Aggregate operator, 464-466, 938 


AggregateException class, 595, 597, 630, 
956 
Flatten method, 957 
Handle method, 957 
parallel programming, 956 
aggregation methods, 462-466 
aggregation operators, 377 
ALC (see assembly load context) 
aliasing 
namespace alias qualifiers, 87 
types within namespaces, 85 
All method, 467 
alternator (|), 998 
Amazon Web Services (AWS), 875 
ambient property, 189 
Amdahl's law, 926 
ampersand (&) 
address-of operator, 220 
bitwise AND operator, 41, 132 
in parameter type names, 801 
anchors, 1004 
annotations, LINQ to XML, 496 
anonymous disposal, 527-528 
anonymous methods, 169 
anonymous pipes, 648, 650-652 
anonymous types, 195, 396 
Any method, 466 
APM (Asynchronous Programming 
Model), 633 
AppContext, 313 
AppDomain.CurrentDomain.BaseDirec- 
tory, 645, 674 
application base directory, 645 
application folder, 678 
application layer, 688 
application manifest, 682, 759 
application servers, 893 
Application.DispatcherUnhandledExcep- 
tion, 586, 619 
Application. ThreadException, 586 
ApplicationData directory, 673, 678 
ApplicationData.Current.Temporary- 
Folder, 676 
arguments 
implications of passing by reference, 
5o: 
named, 62 
pass-by-value versus pass-by- 
reference, 56, 93 
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passing to a Dynamic Method, 826 
arithmetic operators, 39 
Array class, 327-335 

basics, 327-329 

construction and indexing, 329-331 

converting/resizing, 335 

copying, 335 

enumeration, 331 

length and rank, 331 

reversing elements, 334 

searching, 332 

sorting, 333-334 
array initialization expressions, 48, 52 
array pooling, 541 
array types, obtaining, 799 
Array... 

Array.ConvertAll method, 335 

Array.Sort, 334 

ArrayList, 336-338 
arrays, 48-53 

array initialization expressions, 48, 52 

bounds checking, 53 

covariance, 145 

default element initialization, 49 

indices and ranges, 49 

jagged, 51 

multidimensional, 50 

rectangular, 51 

simplified initialization expressions, 

D2 

type names, 801 

value types versus reference types, 49 
as operator, 109 
ASCII character set, 253 
AsEnumerable operator, 460 
AsEnumerable query operator, 403 
ASP.NET Core, 240 
AsQueryable operator, 460 
assemblies, 23, 757-796 

application manifest, 759 

applying attributes to, 206 

Assembly class, 761 


fully qualified names, 763 

informational/file versions, 765 

loading/resolving/isolating, 775-796 

modules, 760 

names, 763-765 

reflecting, 817 

resources, 768-775 

Roslyn compiler, 1035 

satellite assemblies, 773-775 

specifying attributes, 759 

strong names and assembly signing, 
762 


assembly load context (ALC), 776-796 


Assembly.Load and contextual ALCs, 
784-787 

Assembly DependencyResolver, 788 

current ALC, 783 

default ALC, 782 

default probing, 783 

EnterContextualReflection, 785-787 

legacy loading methods, 789-791 

LoadFile and Load(byte[]), 790 

LoadFrom method, 790 

LoadFromAssemblyName, 779 

loading assemblies, 778 

loading/resolving unmanaged libra- 
ries, 787 

resolving assemblies, 779-782 

unloading, 789 

writing a plug-in system, 791-794 


assembly resolution, 775, 779-782 
Assembly... 


Assembly class, 761 
Assembly.GetType, 798 
Assembly.Load, 776, 784-787 
AssemblyBuilder, 830 
AssemblyDependencyResolver, 788 
Assembly FileVersion, 765 
AssemblyInformational Version, 765 
AssemblyLoadContext, 776 
AssemblyName, 764 

Assembly QualifiedName, 800 


Assert method, 555 
assignment expressions, 65 
assignment operators, 66 
associativity, operator, 66 


assembly manifest, 758 
AssemblyName class, 764 
Authenticode signing, 765-768 
components, 757-761 


defined, 4 asterisk (*) 
emitting assemblies and types, as deference operator, 220 
830-833 as multiplication operator, 23, 26, 39 
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in regular expressions, 1002 
asynchronous call graph, 602 
asynchronous functions, 605-624 

asynchronous call graph execution, 

614 
asynchronous lambda expressions, 
616 

asynchronous methods in WinRT, 619 

asynchronous streams, 616-618 

asynchrony and synchronization con- 

texts, 619 

avoiding excessive bouncing, 624 

awaiting, 605-611 

defined, 606 

optimizations, 621-624 

parallelism, 615 

precautions when using Value- 

Type<T>, 623 

returning Task<TResult>, 612 

synchronous completion, 621-623 

ValueTask<T> and, 623 

writing, 611-615 
asynchronous lambda expressions, 616 
asynchronous patterns, 625-633 

Asynchronous Programming Model 

(APM), 633 

BackgroundWorker class, 634 

cancellation, 625-627 

Event-Based Asynchronous Pattern, 

634 

obsolete patterns, 633-635 

progress reporting, 627-628 

task combinators, 629-633 

task-based patterns, 629 
asynchronous programming 

continuations and, 602 

principles, 601 
Asynchronous Programming Model 

(APM), 633 
asynchronous streams, 616-618 

about, 12 

IAsyncEnumerable<T> in ASP.Net 

Core, 618 

querying IAsyncEnumerable<T>, 618 
asynchrony, 600-605 

(see also concurrency and asynchrony) 

asynchronous functions in C#, 

605-624 


asynchronous lambda expressions, 
616 
asynchronous patterns, 625-633 
asynchronous programming princi- 
ples, 601 
coarse-grained concurrency versus, 
610 
language support and, 603-605 
principles of, 600-605 
synchronous versus asynchronous 
operations, 600 
AsyncLocal<T>, 917 
atomicity, locking and, 886 
attributes, 204-207 
applying to assemblies and backing 
fields, 206 
attaching custom attributes to 
dynamic construct, 838 
attribute classes, 205 
AttributeUsage, 820 
basics, 818 
caller info attributes, 206 
controlling JSON serialization with, 
741-743 
defining your own, 820 
named/positional parameters, 205 
reflection, 818-823 
retrieving at runtime, 822 
specifying multiple attributes, 206 
AttributeUsage attribute, 820 
authentication 
client-side classes, 702-704 
CredentialCache and, 703 
via headers with HttpClient, 704 
Authenticode, 765-768 
assembly signing, 765-768 
code-signing certificate, 767 
signing with, 766 
signing with signtool.exe, 767 
time stamping, 768 
automatic garbage collection, 529-549 
automatic properties, 100 
autonomous tasks, 595 
AutoResetEvent, 903-906 
Average operator, 463 
AVL tree, 359 
await expressions, 605-611 
awaiting in a UI, 608-610 
capturing local state, 607 
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locking and, 897 
AWS (Amazon Web Services), 875 


B 


background garbage collection, 539 
background threads, 586 
BackgroundWorker class, 634 
backing fields, 206 
backing store streams, 643 
backslash (\) 
preceding escape sequences, 45 
in regular expressions, 1000 
banker's rounding, 283 
Barrier class, 910-911 
base class library (BCL), 4-5 
base constructors, 837 
base keyword, 113 
base types and interfaces, 801 
base-class constraint, 140 
BaseType property, 801 
Basic Multilingual Plane (BMP), 245, 256 
BCL (base class library), 4-5 
BigInteger struct, 289 
binary adapters, 659 
binary operators, 66 
binary serializer, 747-749 
basics, 728, 748 
ISerializable, 751-754 
{NonSerialized] attribute, 749 
[OnDeserialized] attribute, 750 
[OnDeserializing] attribute, 749 
[OnSerializing] and [OnSerialized] 
attributes, 750 
[OptionalField] attribute and Version- 
ing, 751 
serialization attributes, 749-751 
subclassing serializable classes, 753 
BinaryFormatter, 748 
BinaryReader, 719 
BinaryWriter, 719 
binding 
dynamic binding, 208-216 
static versus dynamic, 208 
BindingFlags enum, 814 
bit-mapped attributes, 818 
BitArray class, 342 
BitConverter, 286 
bitwise operators, 41 


blocking 
spinning versus, 579 
threads, 578 
BlockingCollection<T>, 961-964 
using tasks, 963-964 
writing a producer/consumer queue, 
962-964 
BMP (Basic Multilingual Plane), 245, 256 
bool (Boolean) type and operators, 28, 
43-45 
conditional operators, 44 
equality and comparison operators, 44 
bounds checking, 53 
boxing, 117 
copying semantics, 118 
interfaces and, 129 
nullable values, 186 
braces ({}) 
enclosing expressions in interpolated 
strings, 248 
enclosing statement blocks, 22, 26, 70 
in if statements, 73 
in regular expressions, 1002 
branching, 827 
break statement, 79 
broadcaster type, 158 
BrotliStream, 661 
BufferedStream class, 652 
builder class, 359 
built-in types, 27 
byte arrays, 255 
byte type, 41 


C 


C# (generally) 

basics, 1-20 

brief history of features introduced 
from C# 2.0 through C# 8.0, 8-20 

frameworks, 4-5 

language basics, 21-87 

legacy/niche frameworks, 6 

memory management, 3 
(see also garbage collection [GC]) 

object orientation, 1 

simple program, 21-24 

syntax, 24-27 

type safety, 2 


Windows Runtime and, 7 
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C# 8 
array ranges and indices, 49 
asynchronous streams, 616-618 
default interface members, 129 
null-coalescing assignment operators, 
69 
nullable reference types, 191-193 
positional patterns, 203 
property patterns, 201 
static local methods, 93 
switch expressions, 77 
tuple patterns, 203 
using declarations, 175 
using indices and ranges with index- 
ers, 103 
CA (Certificate Authority), 766 
caching, 546 
calculated properties, 100 
call graph, 603, 614 
call sites, 852 
call-site caching, 853 
caller info attributes, 206 
calling site, 98 
CallSite<>, 851 
canceling a PLINQ query, 933 
cancellation tokens, 896, 933 
captured variables, 384, 584 
caret (4) 
bitwise exclusive OR operator, 41 
in regular expressions, 1001, 1004 
cartesian product, 438 
Cast operator, 457 
casting, 108-110 
as operator, 109 
downcasting, 108 
introducing a pattern variable, 110 
is operator and, 110 
upcasting, 108 
catch clause, 172-173, 183 
CCW (COM-Callable Wrapper), 994 
centralized exception handling, 586 
Certificate Authority (CA), 766 
chaining 
encryption streams, 873 
extension methods, 194 
ChangeType method, 284 
char type, 45, 243 
character sets, 253, 1001 
characters, accessing within strings, 246 


checked operator, 40 
child tasks, 948, 952 
chunk partitioning, 936 
circular dependencies, 842-843 
class constraint, 141 
classes, 1, 23, 89-106 
abstract, 111 
anonymous types, 195 
constants, 90-91 
constructors and inheritance, 114 
deconstructors, 95-97 
fields, 89 
finalizers, 104 
inheritance (see inheritance) 
instance constructors, 94 
methods, 92-93 
nameof operator, 106 
object initializers, 97 
partial types/methods, 105 
properties, 99-101 
sealing functions and classes, 113 
static classes, 104 
static constructors, 103 
this reference, 98 
writing a class versus and interface, 
130 
client-side classes, 692-706 
authentication, 702-704 
exception handling, 704 
HttpClient, 696-701 
proxies, 701 
WebClient, 693-694 
WebRequest and WebResponse, 
695-696 
Clone method, 328, 335 
cloning, 476 
Close method, 525 
closed generic types, 840 
closures, 167 
CLR (Common Language Runtime), 3 
C# members versus CLR members, 
808 
deadlocks and, 888 
indexer implementation, 103 
property implementation, 101 
type equivalence, 994 
coarse-grained concurrency, 610 
code point, 253 
Collection<T>, 351-353 
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CollectionBase, 353 
collections, 315-367 
Array class, 327-335 
customizable collections and proxies, 
351-356 
enumeration, 315-322 
ICollection and IList interfaces, 
323-326 
IComparer and Comparer, 363-364 
IEqualityComparer and EqualityCom- 
parer, 361-363 
immutable collections, 357-360 
IStructuralEquatable and IStructural- 
Comparable, 366 
lists, queues, stacks, and sets, 335-344 
plugging in equality and order, 
360-367 
StringComparer, 365 
collision, 870 
colon (:), in named arguments, 62 
COM (Component Object Model), 
990-996 
calling a COM component from C#, 
990-994 
dynamic binding, 993 
embedding interop types, 994 
enabling registry-free COM, 996 
exposing C# objects to, 994-996 
implicit ref parameters, 992 
indexers, 992 
TUnknown and IDispatch, 990 
optional parameters and named argu- 
ments, 991 
purpose of, 989 
type equivalence, 994 
type system basics, 989 
COM interop types, 990 
COM-Callable Wrapper (CCW), 994 
comma (,), 51 
comments, 27 
Common Language Runtime (see CLR) 
CommondApplicationData, 673 
Compare method, 249, 251 
CompareOrdinal method, 249, 251 
Comparer, 363-364 
CompareTo, 249, 251, 428 
comparison operators, 44, 217, 307 
comparisons 


ordinal versus culture comparison, 
250 
string equality comparisons, 250 
string order comparison, 251 
string types, 249-252 
compilation 
in C#, 23 
in Roslyn, 1033-1043 
complement operator (~), 41 
Complex struct, 290 
Component Object Model (see COM) 
composite format string, 248 
composite formatting, 273) 
Compressed file attribute, 667 
compression streams, 661-664 
compressing in memory, 662 
Unix gzip file compression, 663 
Concat operator, 248, 456 
concatenation, 47 
concurrency and asynchrony, 575-635 
common concurrency scenarios, 575 
principles of asynchrony, 600-605 
tasks, 592-600 
threading, 576-592 
concurrent collections, 958-961 
ConcurrentBag<T>, 960 
IProducerConsumerCollection<T>, 
959 
concurrent garbage collection, 540 
(see also background garbage collec- 
tion) 
concurrent operations, 625-627 
ConcurrentBag<T>, 960 
ConcurrentQueue<T>, 535 
conditional (ternary) operator, 45 
conditional and operator (&&), 44, 552 
Conditional attribute, 225, 553-554 
conditional compilation, 551-554 
Conditional attribute, 553-554 
static variable flags versus, 552 
conditional continuations, 953 
conditional or operator (||), 44, 552 
ConfigureAwait, 624 
Console class, 273, 309 
constant pattern, 204 
constants, 90-91 
ConstrainedCopy method, 335 
constraints, 140-141 
constructors, 29 
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deconstructors and, 95-97 
emitting, 837 
field initialization order, 115 
implicit calling of the parameterless 
base-class constructor, 115 
implicit parameterless, 95 
inheritance and, 114 
instance constructor and field initiali- 
zation order, 95 
instance constructors, 94 
nonpublic constructors, 95 
overloading, 94 
static, 103 
Contains method, 246, 466 
contextual keywords, 26 
continuations 
asynchronous programming and, 602 
child tasks and, 952 
conditional, 953 
event wait handles and, 908 
exceptions and, 951 
multiple antecedents with, 954 
multiple continuations on a single 
antecedent, 954 
task parallelism, 950-954 
Task<TResult> and, 951 
tasks and, 596-597 
continue statement, 80 
ContinueWith method, 950-954 
contravariance, 146, 156 
conversions, 30 
enums, 131, 292-294 
generic types, 142 
implicit/explicit, 30 
implicit/explicit nullable conversions, 
186 
LINQ methods, 457-460 
numeric types, 38, 288 
operator overloading and, 218 
Convert class, 283-284 
banker's rounding, 283 
base-64 conversions, 284 
dynamic conversions, 284 
parsing numbers in base 2, 8, and 16, 
284 
rounding real to integral conversions, 
283 
ConvertTime method, 266 
cookies, 709 


Coordinated Universal Time (UTC), 258, 
265 
Copy method, 335 
CopyTo method, 335, 967 
core dump, 572 
correlated subquery, 431 
Count operator, 462 
CountdownEvent, 907 
covariance, 143-146, 157 
Create ViewAccessor method, 685 
CredentialCache, 703 
CredentialCache.DefaultNetworkCreden- 
tials, 702, 704 
cross join, 438 
cross-process EventWaitHandle, 908 
cryptography and encryption, 867-879 
hashing, 869-871 
options in .NET, 867 
public-key encryption/signing, 
876-879 
symmetric encryption, 871-876 
Windows Data Protection, 868 
CryptoStream, 871-872 
CSharp... 
CSharpCompilation, 1034 
CSharpCompilation.Create, 1035 
CSharpCompilationOptions, 1034 
CSharpParseOptions, 1022 
CSharpSyntaxRewriter, 1032, 1041 
CSharpSyntaxTree, 1022 
CSharpSyntaxTree.Create, 1022 
CSharpSyntaxTree.Parse, 1028 
CSharpSyntax Walker, 1026 
.csprog file, 226 
culture sensitive string comparison, 250 
CultureInfo, 252, 272 
CultureInfo.CurrentCulture, 271 
CultureInfo.CurrentUICulture, 773 
cultures and subcultures, 774 
custom attributes, 819 
custom binding, 209 
custom conversion, 143 
custom types, 28-30 
constructors and instantiation, 29 
equality and, 301-306 
instance versus static members, 29 
members of a type, 29 
public keyword, 30 
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symmetry of predefined/custom types, 

Zo 
customizable collections and proxies, 

351-356 

Collection<T> and CollectionBase, 
351-353 

KeyedCollection<TKey,TItem> and 
DictionaryBase, 353-356 

ReadOnlyCollection<T>, 356 


D 


data contract serializer, 728 
data members, 29 
data parallelism, 924 
data transfer object (DTO), 434 
DataReader type, 722 
dates and times, 256-269 
dates and time zones, 264-269 
DateTime and DateTimeOffset, 
258-264 
parsing with regular expressions, 1010 
TimeSpan, 256 
DateTime, 258-264 
constructing, 259 
DateTimeOffset versus, 258 
daylight saving time and, 268 
format strings, 279-281 
formatting/parsing datetimes, 263 
null values, 263 
parsing/misparsing, 280 
returning current date/time with, 261 
time zones and, 264 
working with dates and times, 262 
DateTime.MinValue, 263 
DateTimeFormatInfo, 273 
DateTimeOffset, 258, 296 
constructing, 260 
DateTime versus, 258 
format strings, 279-281 
formatting/parsing datetimes, 263 
null values, 263 
returning current date/time with, 261 
time zones and, 265 
working with dates and times, 262 
DateTimeStyles, 274, 281 
daylight saving time, 268 
DbContext, 405-410 
change tracking, 411 


configuring the connection, 405 
configuring the model, 406 
creating the database, 408 
disposing, 409 
object tracking, 410 
using, 409 
Debug and Trace classes, 555-558 
Fail and Assert methods, 555 
flushing and closing listeners, 557 
TraceListener, 556 
Debugger class, 558 
attaching and breaking, 558 
attributes, 558 
DebuggerHidden attribute, 558 
DebuggerStepThrough attribute, 558 
declaration statements, 71 
declarative parallelism, 925 
declared symbols, 1039 
Deconstruct method, 95, 203 
deconstructing assignment, 96 
deconstructors, 15, 95-97 
decorator sequence, 385 
decrement operator (--), 39 
deep cloning, 476 
default keyword, 55, 139 
default scheduler, 955 
DefaultlfEmpty operator, 460 
deferred execution 
captured variables, 384 
chaining decorators, 386 
EF Core queries, 415 
how queries are executed, 387 
LINQ queries, 382-388 
mechanism of operation, 385 
reevaluation, 383 
subqueries and, 391 
deferred loading, 500 
#define directive, 551 
DefineMethodOverride, 835 
definite assignment policy, 54 
DeflateStream, 661-663 
delegate type, 149 
delegate variable assignment (+=), 151 
Delegate.CreateDelegate, 803 
delegates, 149-158 
calling dynamically instantiated dele- 
gates, 813 
compatibility, 156-158 
dynamically instantiating, 803 
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Func and Action delegates, 154 
generic delegate type parameter var- 
iance, 157 
generic delegate types, 153 
instance versus static method targets, 
152 
interfaces versus, 154 
multicast, 151 
parameter compatibility, 156 
return type compatibility, 157 
type compatibility, 156 
writing plug-in methods with, 150 
DelegatingHandler, 700 
DescendantNodes method, 479 
Descendants method, 479 
diagnostics, 551-573 
conditional compilation, 551-554 
cross-platform tools, 569 
Debug and Trace classes, 555-558 
debugger integration, 558 
performance counters, 564-568 
processes and process threads, 559 
StackTrace and StackFrame, 560-562 
Stopwatch class, 569 
Windows event logs, 562-564 
dictionaries, 344-350 
Dictionary<TKey,T Value> and Hasht- 
able, 347-348 
Dictionary, 346 
IDictionary<TKey,T Value>, 345 
ListDictionary and HybridDictionary, 
349 
OrderedDictionary, 348 
sorted, 349 
dictionary attack, 870 
Dictionary<TKey,T Value>, 347-348 
DictionaryBase, 355 
digital signing, 878 
directories, 676 
(see also file and directory operations) 
Directory class, 669 
DirectoryInfo class, 670 
disassembler, 844-849 
discard symbol (_), 59, 76 
discards, 14, 59 
disposal 
anonymous, 527-528 
calling Dispose from a finalizer, 532 
clearing fields, 526 


defined, 523 
encryption objects, 875 
IDisposable, Dispose, and Close, 
523-528 
standard disposal semantics, 524 
wait handles, 904 
when to dispose, 525-526 
Dispose method, 523-528, 532 
Distinct operator, 429 
division, 39 
DLL (Dynamic Link Library) 
callbacks from unmanaged code, 980 
calling into native DLLs, 975 
mapping structs to unmanaged mem- 
ory, 984-988 
simulating C unions, 981 
type marshaling, 976-980 
.dll files, 24 
DLR (Dynamic Language Runtime), 209, 
851-852 
DNS (Domain Name Service), 715 
Dns class, 715 
do-while loops, 78 
document object model (see DOM) 
documentation comments (///), 226 
dollar sign ($) 
preceding interpolated strings, 47, 248 
in regular expressions, 1004 
DOM (document object model) 
basics, 469 
expression DOM, 418 
Domain Name Service (DNS), 715 
dotnet tool, 24 
dotnet-counters tool, 569 
dotnet-dump tool, 572 
dotnet-trace tool, 571 
double type, 42 
double-checked locking, 913 
downcasting, 109 
DownloadDataTaskAsync, 609 
DPAPI (Windows Data Protection API), 
868 
Drivelnfo class, 674 
DTO (data transfer object), 434 
dump, 572 
dynamic binding, 208-216 
on COM types, 993 
conversions, 212 
custom binding, 209 
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dynamic calls without dynamic receiv- 
ers, 213 
dynamic expressions, 213 
language binding, 210 
runtime representation of dynamic, 
211 
RuntimeBinderException, 211 
static binding versus, 208 
static types in dynamic expressions, 
214 
uncallable functions, 215 
var versus dynamic types, 212 
dynamic calls, 213 
dynamic code generation, 823-830 
branching, 827 
exception handling, 829 
generating IL with DynamicMethod, 
823-825 
generating local variables, 826 
instantiating objects and calling 
instance methods, 828-830 
passing arguments to a Dynamic 
Method, 826 
dynamic construct, 838 
dynamic conversions, 284 
dynamic expressions, 213 
Dynamic Language Runtime (DLR), 209, 
851-852 
dynamic languages, 863-865 
Dynamic Link Library (DLL) (see DLL) 
dynamic objects, 860-863 
dynamic programming, 851-865 
Dynamic Language Runtime (DLR), 
851-852 
dynamic member overload resolution, 
854-860 
implementing dynamic objects, 
860-863 
interoperating with dynamic lan- 
guages, 863-865 
numeric type unification, 853 
dynamic receivers, dynamic calls without, 
213 
dynamic type, 212 
DynamicMethod 
generating IL with, 823-825 
passing arguments to, 826 
DynamicObject, 860-863 
DynamicVisit method, 855 


E 


EAP (Event-Based Asynchronous Pat- 
tern), 634 
EF Core, 404-416 
adding and removing entities from 
navigation collections, 413 
change tracking, 411 
DbContext, 405-410 
deferred execution, 415 
entity classes, 404 
GroupBy in, 455 
loading navigation properties, 413 
navigation properties, 412-414 
object tracking, 410 
SelectMany in, 439-440 
subqueries and joins in, 432-433 
8- and 16-bit integral types, 41 
Element method, 478 
element operators, 377, 460-461 
element typing, 375 
ElementAt operator, 461 
elements, 369 
#elif directive, 552 
#elif statement, 552 
#else directive, 552 
#else statement, 552 
else clause, 73 
"Elvis" (null-conditional) operators, 69 
email 
receiving POP3 mail with TCP, 
720-722 
sending mail with SmtpClient, 716 
embarrassingly parallel problems, 926 
Empty operator, 467 
empty strings, 246 
Encoding object, 254 
Encoding.GetEncoding, 254 
Encrypted file attribute, 667 
encryption (see cryptography and encryp- 
tion) 
#endif directive, 551 
EndsWith method, 246 
EnterAsync extension method, 898 
EnterContextualReflection, 785-787 
Enum... 
Enum.GetNames, 294 
Enum.GetValues, 294 
Enum.Parse, 294 
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Enum.ToObject, 293 
Enumerable... 
Enumerable.AsEnumerable operator, 
403 
Enumerable.ElementAt, 461 
Enumerable.Except, 457 
Enumerable.Join, 449 
Enumerable.Where, 426 
enumeration, 179, 315-322 
Array class, 331 
collection initializers, 180 
collections, 315-322 
IEnumerable and IEnumerator, 316 
IEnumerable<T> and IEnumera- 
tor<T>, 317-319 
implementing the enumeration inter- 
faces, 318-322 
enumerators, 179, 970-972 
enums, 131-134, 291-295 
conversions, 131, 292-294 
defined, 131 
enum to integral conversions, 292 
enumerating enum values, 294 
Flags attribute, 132 
format strings, 282 
integral to enum conversions, 293 
operators that work with, 133 
semantics of, 294 
string conversions, 294 
type-safety issues, 133 
Environment.SpecialFolder, 673 
equal sign (=), as assignment operator, 27, 
65 
equality comparison, 249, 296-306 
changing the meaning of equality, 302 
equality and custom types, 301-306 
implementing IEquatable<T>, 305 
overloading == and !=, 304 
overriding equality semantics, 302 
overriding Equals, 304 
overriding GetHashCode, 303 
pluggable equality operators, 306 
speeding up with structs, 302 
standard equality protocols, 297-301 
value equality versus referential equal- 
ity, 296 
equality operator (==), 27, 44, 187, 249, 
297, 304 
Equals method versus, 301 


overloading, 217, 304 
string equality comparison, 250 
EqualityComparer, 361-363 
EqualityComparer<T>, 299 
EqualityComparer<T>.Default, 363 
Equals method 
equality operator (==) versus, 301 
IComparable versus, 307 
#error preprocessor directive, 224 
escape sequences, 45, 1000 
evaluation stack, 825 
event accessors, 159 
event wait handles, 903-910 
AutoResetEvent, 903-906 
continuations and, 908 
CountdownEvent, 907 
cross-process, 908 
disposal, 904 
ManualResetEvent, 906 
signaling with, 903-910 
WaitAny, WaitAll, and SignalAndWait, 
909 
Event-Based Asynchronous Pattern 
(EAP), 634 
EventLog class, 562-564 
events, 2, 158-165 
accessors, 163 
mechanism of operation, 159 
modifiers, 165 
standard event pattern, 160-163 
weak references and, 547-549 
Except operator, 457 
exception filter, 173 
exception handling 
centralized, 586 
client-side classes, 704 
dynamic code generation, 829 
threading and, 584 
exception posting, 619 
exceptions, 175-177 
(see also try statements and excep- 
tions) 
autonomous tasks and, 595 
continuations and, 951 
tasks and, 595 
throwing, 175-177 
exclusive locking, 882-890 
choosing the synchronization object, 
885 
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deadlocks, 888 
lock statement, 883 
locking and atomicity, 886 
lockTaken overloads, 884 
Monitor.Enter and Monitor.Exit, 884 
Mutex, 889 
nested locking, 887 
performance, 889 
TryEnter method, 884 
when to lock, 885 
.exe files, 24 
execution interface, 7 
ExpandoObject, 863 
explicit conversions, 30, 38 
explicit loading, 414 
expression DOM, 418 
expression statements, 71 
expression trees, 416-420 
compiling, 417 
delegates versus, 416-420 
lambda expressions and, 165 
expression-bodied members (=>), 92, 94, 
100, 105, 217 
expression-bodied methods, 15, 92 
expression-bodied properties, 100 
expressions, 64-66 
assignment expressions, 65 
primary expressions, 65 
switch expressions, 77 
void expressions, 65 
Extensible Application Markup Language 
(XAML), 285 
extension methods, 193-195, 374 
ambiguity and resolution, 194 
chaining, 194 
EnterAsync, 898 
instance methods versus, 195 
namespaces, 194 
precedence among, 195 
extern aliases, 85 


F 


Fail method, 555 
fat arrow notation (=>), 92, 100 
fields, 89 
clearing in disposal, 526 
constructor and field initialization 
order, 95 


constructors and field initialization 
order, 115 
declaring multiple fields together, 90 
emitting, 835-837 
initialization, 90 
properties versus, 99 
readonly modifier, 90 
static constructors and field initializa- 
tion order, 103 
file and directory operations, 665-675 
catching filesystem events, 674 
Directory class, 669 
File class, 666-669 
file I/O in UWP, 676-680 
FileInfo and DirectoryInfo, 670 
Path class, 671-673 
querying volume information, 674 
special folders, 673 
File class 
compression and encryption 
attributes, 667 
file security, 668 
shortcut methods, 645 
file compression, 663 
FileInfo class, 670 
FileMode, 646 
filename, 645 
FileOpenPicker class, 679 
FileSecurity class, 668 
FileStream, 644-647 
advanced features, 647 
constructing, 644 
specifying a FileMode, 646 
specifying a filename, 645 
FileSystem Watcher class, 674 
filtering, 425-429 
< and > string comparisons in EF 
Core, 428 
Distinct operator, 429 
SQL LIKE comparisons in EF Core, 
427 
Take and Skip, 428 
TakeWhile and SkipWhile, 429 
Where, 426-428 
WHERE x IN (..., ..., ...) in EF Core, 
428 
finalizers, 104 
calling Dispose from, 532 
GC and, 531-536 
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GC.ReRegisterForFinalize, 535 
resurrection, 534-536 
finally blocks, 174, 183 
fine-grained concurrency, 602 
First operator, 460 
FirstNode function, 477 
FirstOrDefault operator, 461 
fixed keyword, 987 
fixed statement, 220 
fixed-size buffers, 222 
Flags attribute, 132 
Flatten method, 957 
floating-point types, 36 
conversions, 38 
special float and double values, 41 
fluent syntax, 371-378 
chaining query operators, 372-374 
composing lambda expressions, 
374-376 
importance of extension methods, 374 
joining in, 445 
mixed-syntax queries, 381 
natural ordering, 377 
query syntax versus, 381 
FolderPicker class, 679 
for loops, 78 
foreach loops, 79 
foreground threads, 586 
form data, uploading, 707 
format item, 249 
formatters, serialization, 728 
formatting and parsing, 270-282 
BitConverter, 286 
composite formatting, 273 
Convert class, 283-284 
custom numeric format strings, 277 
Date/Time format strings, 279-281 
DateTimeFormatInfo, 273 
DateTimeStyles, 281 
enum format strings, 282 
enums, 291-295 
equality comparison, 296-306 
format providers, 271-275 
format providers and CultureInfo, 272 
format string, 249 
globalization, 286 
Guid struct, 295 
IFormatProvider and ICustomFormat- 
ter, 274 


NumberFormatInfo, 273 
NumberStyles, 278 
numeric format strings, 276-276 
parsing with format providers, 274 
standard format strings and parsing 
flags, 275-282 
ToString and Parse methods, 270 
type converters, 285 
working with numbers, 288-291 
XmlConvert, 284 
FormatTransitionTime, 268 
forward slash (/) 
as division operator, 39 
trailing in URIs, 692 
forward slash, double (//), 21, 27 
forward-only enumerators, 970-972 
frameworks, 4-5 
.NET standard, 231-234 
C# language versions and, 234 
fundamentals, 243-313 
legacy/niche, 6 
overview, 231-242 
friend assemblies, 124 
FTP (File Transfer Protocol), 713-715 
fully qualified names, 763 
Func delegate, 154, 375 
function members, 2, 29 
functional construction, 474 
functional programming, 357 


G 


garbage collection (GC), 529-531 
array pooling, 541 
automatic, 529-531 
background collection, 539 
defined, 523 
finalizers, 531-536 
forcing, 540 
generational collection, 537 
Large Object Heap, 538 
managed memory leaks, 542-545 
memory consumption and, 530 
memory pressure, 541 
notifications, 540 
optimization, 536-540 
process of, 536-542 
resurrection, 534-536 
roots, 530 
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tuning at runtime, 541 
weak references, 545-549 
workstation versus server collection, 
539 
GC (see garbage collection) 
GC. 
GC.AddMemoryPressure, 541 
GC.Collect, 540 
GC.EndNoGCRegion, 541 
GC.RegisterForFullGCNotification, 
540 
GC.RemoveMemoryPressure, 541 
GC.ReRegisterForFinalize, 535 
GC.SuppressFinalize, 534 
GC. TryStartNoGCRegion, 541 
GC.WaitForFullGCApproach, 540 
GC. WaitForFullGCComplete, 540 
GCNotificationStatus, 540 
GCSettings.LatencyMode, 541 
generational garbage collection, 537 
generic interface, 815 
generic methods, 814 
defining, 839 
emitting, 838-844 
retrieving/invoking, 812 
generic types 
anonymously calling members of, 
857-860 
C++ templates versus, 147 
defining, 840 
delegate types, 153 
emitting, 838-840 
obtaining member metadata, 810 
reflection, 804 
type names, 800 
generics, 135-148 
C# generics versus C++ templates, 147 
constraints, 140-141 
contravariance, 146 
covariance, 143-146 
declaring type parameters, 138 
default generic value, 139 
generic methods, 137 
generic types, 136 
purpose of, 137 
self-referencing generic declarations, 
142 
static data, 142 
subclassing generic types, 141 


type parameters and conversion, 142 
typeof operator, 139 
unbound generic type, 139 
get accessor, 101 
Get... 
GetAsync method, 698 
GetBytes method, 255 
GetData method, 916 
GetEncodings method, 254 
GetEnumerator method, 179, 320 
GetHashCode, 303 
GetSemanticModel, 1036-1041 
GetString method, 255 
GetSymbolInfo, 1038 
GetType method, 118 
GetUnicodeCategory method, 245 
GetValue method, 330 
global keyword, 87 
globalization, 286 
checklist, 287 
defined, 286 
testing against different cultures, 287 
goto statement, 80 
greedy quantifiers, 1003 
GroupBy operator, 453-455 
grouping (LINQ), 452-455 
GroupJoin operator (LINQ), 445-449 
enumerable implementations, 449 
flat outer joins, 447 
joining with lookups, 447 
groups 
named groups, 1007 
regular expressions, 1006-1007 
Guid struct, 295 
Guid.NewGuid, 295 
gzip file compression, 663 
GZipStream, 661 


H 


Handle method, 957 

hash (#), 224, 551 

hash code, 303, 869 

hash partitioning, 935 

hashing, 869-871 
hash algorithms in .NET Core, 870 
passwords, 870 

hashing algorithm, 869 

HashSet<T>, 342 
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Hashtable class, 347-348 
hashtables, 303 
headers, 706 
heap, 53 
HideBySig, 835 
hiding inherited members, 113 
hiding members, 112 
HTTP (Hypertext Transfer Protocol) 
cookies, 709 
headers, 706 
networking and, 706-710 
query strings, 707 
request and response features of Web- 
Client, 706-710 
uploading form data, 707 
writing an HTTP server, 710-713 
HttpClient, 693, 696-701 
authentication via headers with, 704 
chaining handlers with Delegating- 
Handler, 700 
cookies, 709 
custom headers, 706 
exception handling, 704 
GetAsync and response messages, 698 
HttpMessageHandler, 699 
proxies with, 701 
SendAsync and request messages, 698 
unit testing and mocking, 699 
uploading data and HttpContent, 698 
uploading form data, 708 
HttpClientHandler, 697, 701, 709 
HttpContent, 698 
HttpListener, 710-713 
HttpMessageHandler, 699 
HttpRequestMessage, 698 
HttpResponseMessage, 698 
Http WebRequest, 709 
HybridDictionary, 349 
hyphen (-), 1001 


IANA (Internet Assigned Numbers 

Authority) Character Set, 254 
IAsyncAction, 628 
IAsyncDisposable, 617 
IAsyncEnumerable<T>, 617 

in ASP.Net Core, 618 

querying, 618 


IAsyncOperation<TResult>, 619, 628 
IAsyncOperation WithProgress<TResult>, 
619 
IAsyncResult, 633 
ICollection, 323-326 
ICollection<T>, 323-325 
IComparable interface, 306-308, 451 
Equals versus, 307 
implementing, 308 
IComparer, 307, 363-364 
IConvertible, 283 
ICustomFormatter, 274 
IDbConnection, 525 
identifiers, 25-26 
IDictionary, 346 
IDictionary<TKey,T Value>, 345 
Dispatch interface, 990 
IDisposable, 523-528 
anonymous disposal and, 527-528 
IEnumerable<T> and, 318 
IDynamicMetaObjectProvider (IDMOP), 
209, 860 
IEnumerable, 316, 318-322 
IEnumerable<char>, 246 
IEnumerable<T>, 182, 317-319 
IDisposable and, 318 
implementing the enumeration inter- 
faces, 318-322 
LINQ conversion methods and, 
457-460 
IEnumerator, 316 
IEnumerator<T>, 182, 317-319 
IEqualityComparer, 306, 361-363 
IEquatable<T>, 300, 305 
#if directive, 551 
if statement, 72 
IFormatProvider, 274 
IFormattable, 271 
IgnorePattern Whitespace, 999 
IL (intermediate language) 
CLR and, 3 
dynamic code generation, 823-830 
evaluation stack, 825 
exception handling, 829 
generating with DynamicMethod, 
823-825 
instantiating objects and calling 
instance methods, 828-830 
parsing, 844-849 
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writing a disassembler, 844-848 
ildasm tool, 4 
IList, 323-326 
IList<T>, 323, 325 
immutable collections, 357-360 
builders, 359 
creating, 358 
manipulating, 358 
performance, 359 
immutable objects, 895 
ImmutableArray<T>, 359 
ImmutableList<T>, 359 
imperative parallelism, 925 
implicit conversions, 30, 38 
implicit ref parameters, 992 
implicit typing, 64 
in parameter, 60 
increment operator (++), 39 
indexed filtering, 427 
indexers, 102 
CLR indexer implementation, 103 
implementing, 102 
using indices and ranges with, 103 
IndexOf method, 246 
IndexOfAny method, 247 
indices 
arrays and, 50 
bounds checking, 53 
inequality operator (!=), 44, 187, 297, 304 
inheritance, 106-116 
abstract classes/abstract members, 111 
base keyword, 113 
casting and reference conversions, 
108-110 
constructors and, 114 
hiding inherited members, 112 
overloading and resolution, 115 
polymorphism, 107 
sealing functions and classes, 113 
virtual function members, 111 
initialization 
fields, 90 
instance constructor and field initiali- 
zation order, 95 
lazy (see lazy initialization) 
object initializers, 97 
property initializer, 101 
static constructor and field initializa- 
tion order, 115 


Initialization Vector (IV), 872 
INotifyPropertyChanged interface, 411 
INotifyPropertyChanging interface, 411 
instance constructors, 94 
constructor and field initialization 
order, 95 
implicit parameterless, 95 
nonpublic constructors, 95 
overloading, 94 
instance fields, ThreadLocal<T> and, 915 
instance methods 
delegates and, 152 
extension methods versus, 195 
generating, 834 
instance, static members versus, 29 
instantiating a type, 29, 802-803 
instruction atomicity, 887 
int type, 27 
integral types 
conversions, 38 
8- and 16-bit integral types, 41 
enum conversions, 292 
specialized operations on, 39 
integral-type literals, 36 
interface, 1, 125-130 
boxing and, 129 
default implementation (C# 8), 129 
delegates versus, 154 
explicit implementation, 126 
extending, 126 
reimplementing in a subclass, 127-129 
virtual implementation, 127 
when to use nongeneric interfaces, 319 
writing a class versus and interface, 
130 
interface constraint, 140 
internal access modifier, 123 
internationalization, 286 
Internet Assigned Numbers Authority 
(IANA) Character Set, 254 
interoperability, 975-996 
(see also COM [Component Object 
Model]; DLL [Dynamic Link 
Library]) 
mapping structs to unmanaged mem- 
ory, 984-988 
shared memory, 982-984 
simulating C unions, 981 
interpolated string, 47 
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interpolation, 47 
interpreted queries, 398-404 
AsEnumerable operator, 403 
combining local queries with, 402 
execution, 401 
mechanism of operation, 400-400 
interprocess communication (IPC), 648 
Intersect operator, 457 
into keyword, 393 
1OrderedEnumerable, 452 
IOrderedQueryable, 452 
IPAddress class, 690 
IPEndPoint class, 690 
IProducerConsumerCollection<T>, 959 
IProgress<T>, 627 
IPv4/IPv6 addresses, 689 
IQueryable<T>, 398-402 
IReadOnlyCollection<T>, 326 
IReadOnlyList<T>, 326 
is operator, 110 
IsAccessible method, 1039 
ISerializable, 751-754 
IStructuralComparable, 366 
IStructuralEquatable, 366 
ISymbol interface, 1037-1038 
iteration statements, 77-79 
for loops, 78 
foreach loops, 79 
while and do-while loops, 77 
iteration variables, 168 
iterators, 181-184, 320 
composing sequences, 183 
semantics, 182-183 
try/catch/finally blocks, 183 
yield break statement, 182 
TUnknown interface, 990 
IV (Initialization Vector), 872 
IXmlSerializable, 728, 736-738 


J 


jagged arrays, 51 
JIT JJust-In-Time) compilation, 3 
Join method (strings), 248 
Join method (threading), 578 
Join operator (LINQ), 442-449 
basics, 443 
joining in fluent syntax, 445 
joining on multiple keys, 444 


joining, 442-449 

SelectMany, 438 

strings, 248 
JSON, 516-522 

JsonDocument, 519-522 

Utf8JsonReader, 516-518 

Utf8JsonWriter, 518 
JsonCommentHandling, 518 
JsonConverter Attribute, 743 
JsonDocument, 519-522 

LINQ and, 521 

making updates with a JSON writer, 

522 

reading JSON arrays, 520 

reading JSON objects, 520 

reading simple values, 520 
JsonExtensionDataAttribute, 742 
JsonIgnoreAttribute, 741 
JsonPropertyNamedAttribute, 742 
JsonSerializationOptions, 744-747 
JsonSerializer, 738-747 

basics, 727, 738 

controlling serialization with 

attributes, 741-743 

customizing data conversion, 743 

JsonSerializationOptions, 744-747 

serializing child objects, 739-740 

serializing collections, 740 
JsonWriterOptions, 518 
jump statements, 79-81 

break statement, 79 

continue statement, 80 

goto statement, 80 

return statement, 80 

throw statement, 81 
Just-In-Time (JIT) compilation, 3 


K 


key management, 875 
keywords, 25-26 
KnownFolders class, 678 


L 


lambda expressions, 2, 165-169 
anonymous methods versus, 169 
asynchronous, 616 
captured variables and, 584 
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capturing iteration variables, 168 
capturing outer variables, 166-169 
composing, 374-376 
element typing and, 375 
explicitly specifying parameter types, 
166 
Func signatures and, 375 
local methods versus, 169 
subqueries, 388-391 
lambda operator (=>), 165 
language binding, 210 
Large Object Heap (LOH), 538 
Last operator, 460 
Last-In First-Out (LIFO), 341 
LastIndexOf method, 247 
LastNode function, 477 
late binding, 810 
lazy execution (see deferred execution) 
lazy initialization, 912-914 
Lazy<T> class, 913 
Lazylnitializer class, 913 
lazy quantifiers, 1003 
Lazylnitializer class, 913 
left-associative operators, 66 
let keyword, 397 
LIFO (Last-In First-Out), 341 
LIKE operator, 427 
LinkedList<T>, 338 
LINQ operators, 421-468 
aggregation methods, 462-466 
conversion methods, 457-460 
element operators, 460-461 
filtering, 425-429 
generation methods, 467 
grouping, 452-455 
joining, 442-449 
ordering, 450-452 
overview, 422-425 
projecting, 429-441 
quantifiers, 466 
sequence to element or value, 424 
sequence to sequence, 423-424 
set operators, 456 
void to sequence, 425 
Zip operator, 449 
LINQ queries, 369-420 
anonymous types, 396 
basics, 369-371 
composition strategies, 392-395 


deferred execution, 382-388 
fluent syntax, 371-378 
interpreted queries, 398-404 
into keyword, 393 
JsonDocument and, 521 
let keyword, 397 
object initializers, 395 
progressive query building, 392-393 
projecting into an X-DOM, 497-500 
projection strategies, 395-397 
query expressions, 378-382 
subqueries, 388-391 
wrapping queries, 394 
LINQ to XML, 469-500 
annotations, 496 
architectural overview, 469 
documents and declarations, 487-491 
LINQ to XML DOM, 470 
names and namespaces, 491-496 
X-DOM (see XML DOM) 
XML declarations, 489 
LINQ, JsonDocument and, 521 
LINQ-to-objects queries, 370 
list-like collections, 335-344 
BitArray, 342 
HashSet<T> and SortedSet<T>, 342 
LinkedList<T>, 338 
List<T> and ArrayList, 336-338 
Queue<T> and Queue, 340 
Stack<T> and Stack, 341 
List<T> class, 325, 336-338 
ListDictionary, 349 
Listeners, 556-558 
literals, 26 
Load(byte[]) method, 790 
LoadFile method, 790 
LoadFrom method, 790 
LoadFromAssemblyName method, 779 
local methods, 93 
in C# 7, 14 
lambda expressions versus, 169 
static local methods, 93 
local queries, 402 
local sequences, 369 
local variables, 22 
dynamic code generation, 826 
var keyword, 64 
LocalApplicationData, 673 
localization, 286 
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lock statement, 81, 883 
locking 
exclusive (see exclusive locking) 
immutable objects, 895 
locking around thread-safe objects, 
892 
nonexclusive (see nonexclusive lock- 
ing) 
read-only thread safety, 893 
static members, 892 
thread safety and, 582, 890-895 
thread safety and .NET Core types, 
891-893 
thread safety in application servers, 
893 
lockTaken overloads, 884 
logical negation operator (!), 41, 552 
LOH (Large Object Heap), 538 
LongCount operator, 462 
LookupSymbols, 1040 
loop iteration index, 942 


M 


Main method, 22 
man-in-the-middle attack, 876 
managed memory leaks, 542-545 
ManualResetEvent, 906 
ManualResetEventSlim, 906 
MatchEvaluator delegate, 1008 
Math class, 288 
Max operator, 462 
member overload resolution, 854-860 
member types, 807 
MemberInfo subclass, 807 
memory barrier, 886 
memory leaks 
diagnosing, 545 
managed memory leaks, 542-545 
timers and, 543 
memory management, 3 
mapping structs to unmanaged mem- 
ory, 984-988 
shared memory, 982-984 
memory, stack-allocated/unmanaged, 972 
memory-mapped files, 683-686 
(see also shared memory) 
cross-platform interprocess shared 
memory, 684 


random file I/O and, 683 
shared memory (Windows), 684 
working with view accessors, 685 
Memory<T>, 969-970 
MemoryStream, 647, 872 
metacharacters, 1000 
method group, 150 
method overloading, 115 
method parameters, 811-813 
Method Attributes.HideBySig, 835 
methods, 2, 92-93 
emitting, 833-835 
expression-bodied, 92 
generic, 137 
local methods, 93 
overloading, 92 
overriding, 835 
pass-by-value versus pass-by-reference 
parameters, 93 
purpose of, 22 
Microsoft Azure, 875 
Microsoft Dataflow, 923 
Min operator, 462 
minus sign (-) 
negative infinity (—co), 41 
negative zero (—0), 41 
removing delegate instances, 151 
subtraction operator, 39 
mocking handler, 699 
modules, assembly, 760, 818 
Monitor.Enter, 884 
Monitor.Exit, 884 
Mono, 5 
MoveNext method, 182 
multicast delegates, 151 
multidimensional arrays, 50 
multiline comments (/* */), 27, 226 
multiple dispatch, 857 
MultipleActiveResultSets (MARS), 416 
multithreaded program, 576 
multithreaded timers, 919-920 
multithreading (see parallel program- 
ming) 
Mutex, 889 


N 


naked type constraint, 141 
name hiding, 84 





1066 | Index 


name scoping, 83 
named arguments, 62 
named groups, 1007 
named pipes, 648-650 
nameof operator, 106 
namespace alias qualification (::), 87 
namespaces, 23, 81-87 
advanced features, 85-87 
alias qualifiers, 87 
aliasing types and, 85 
attributes, 492 
extension methods and, 194 
extern aliases, 85 
name hiding, 84 
name scoping, 83 
nested using directives withing, 85 
prefixes, 492 
repeated, 84 
rules within, 83-85 
using directive and, 82 
using static directive and, 83 
X-DOM, 493-495 
XML, 491-496 
XmlReader and, 508 
XmlWriter and, 510 
navigation properties, 412-414 
adding and removing entities from 
navigation collections, 413 
lazy loading, 414 
loading, 413 
negative lookahead, 1004 
negative lookbehind, 1004 
nested locking, 887 
nested types, 134-135 
obtaining, 799 
type names, 800 
-NET Core 
about, 5 
CLR and BCL, 235-239 
collections (see collections) 
framework fundamentals, 243-313 
framework overview, 231-242 
.NET Core 3 new features, 232 
standard disposal semantics, 524 
.NET Framework 
about, 5 
application frameworks, 239-242 
compilation, 24 
.NET standard, 231-234 


.NET Framework and .NET Core 
compatibility, 234 
older standards, 234 
reference assemblies, 235 
.NET standard 2.0, 233 
.NET standard 2.1, 233 
networking, 687-723 
addresses and ports, 689 
architecture, 687 
client-side classes, 692-706 
concurrency with TCP, 719 
receiving POP3 mail with TCP, 
720-722 
sending mail with SmtpClient, 716 
TCP in UWP, 722-723 
URIs, 690-692 
using DNS, 715 
using FTP, 713-715 
using TCP, 717-720 
working with HTTP, 706-710 
writing an HTTP server, 710-713 
new keyword, 113 
Nodes function, 477 
nonexclusive locking, 896-903 
lock recursion, 902 
reader/writer locks, 898-903 
semaphore, 896-898 
upgradeable locks, 901-903 
nongeneric interfaces, 319 
nonpublic constructors, 95 
nonpublic members, 813 
null coalescing operator (??), 69, 189 
null operators, 69-70, 189 
null strings, 246 
null value, literal for, 34 
null-coalescing assignment operators, 69 
null-conditional operator (?.), 69, 161, 189 
null-forgiving operator (!), 191 
nullable annotation context, 192 
#nullable enable directive, 192 
nullable reference types, 191-193 
about, 11 
null-forgiving operator, 191 
separating annotation and warning 
contexts, 192 
treating nullable warnings as errors, 
183 
nullable types, 34 
nullable value types, 185-190 
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alternatives to, 190 
bool? with & and | operators, 188 
boxing/unboxing nullable values, 186 
implicit/explicit nullable conversions, 
186 
null operators and, 189 
Nullable<T> struct, 185 
operator lifting, 186 
scenarios for, 189 
Nullable<T> struct, 185 
NullReferenceException, 178, 191 
NumberFormatInfo, 273 
NumberStyles, 274, 278 
numeric format strings, 276-276 
numeric literals, 36 
in C# 7, 13 
numeric suffixes, 37 
type inference, 37 
numeric suffixes, 37 
numeric types, 36-43, 288-291 
arithmetic operators, 39 
BigInteger struct, 289 
char type conversions, 46 
Complex struct, 290 
conversions, 38, 288 
double versus decimal, 42 
8- and 16-bit integral types, 41 
increment and decrement operators, 
39 
Math class, 288 
numeric literals, 36 
Random class, 290 
real number rounding errors, 43 
special float and double values, 41 
specialized operations on integral 


types, 39 
unification, 853 


0 


object initializers, 97 
LINQ queries and, 395 
optional parameters versus, 98 
object instantiation, 115 
object orientation, 1 
object tracking, 410 
object type, 116-120 
boxing and unboxing, 117 


GetType method and typeof operator, 
118 
object member listing, 119 
static/runtime type checking, 118 
ToString method, 119 
Object... 
Object.Equals method, 298-299 
object.Equals static method, 299, 304 
object.ReferenceEquals method, 300 
object.System.Object class, 116 
objects 
heap and, 53 
implementing dynamic objects, 
860-863 
OfType operator, 457 
OperationCanceledException, 950 
OperationCompleted method, 621 
OperationStarted method, 621 
operator lifting 
equality operators (== and !=), 187 
mixing nullable and non-nullable 
operators, 188 
nullable value types, 186 
relational operators, 187 
operator overloading, 216-219 
custom implicit/explicit conversions, 
218 
operator functions, 216 
overloading equality/comparison 
operators, 217 
true/false operators, 219 
operator, defined, 26 
optional parameters, 61 
named arguments and, 991 
object initializers versus, 98 
order comparison, 249, 306-308 
OrderBy operator, 450 
OrderByDescending operator, 450 
OrderedDictionary, 348 
ordering 
comparers and collations, 451 
IOrderedEnumerable and IOrdered- 
Queryable, 452 
LINQ operators and, 450-452 
OrderBy and OrderByDescending 
arguments, 450 
PLINQ and, 929 
ordinal case-sensitive comparison, 250 
ordinal string comparison, 250 
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OS security, 680-683 
administrative elevation and virtuali- 
zation, 682 
running in a standard user account, 
681 
out parameter 
passing, 811 
type names, 801 
out parameter modifier, 58 
out variables, 14, 58 
outer joins 
GroupJoin, 447 
SelectMany, 440 
outer variables, 166-169 
overflow, 40 
overflow check operators 
for constant expressions, 41 
integral types and, 40 
overloading 
instance constructors, 94 
operator overloading, 216-219 
overloading methods, 92 
resolution and, 115 
override modifier, 111, 113 
oversubscription, 591 


P 


Parallel class, 940-946 
Parallel.For and Parallel.ForEach, 
941-946 
Parallel.Invoke, 940 
Parallel Framework (PFX), 923-926 
(see also parallel programming) 
benefits of, 924-926 
components, 925-926 
concepts, 924 
when to use, 926 
parallel programming, 923-964 
AggregateException and, 956 
BlockingCollection<T>, 961-964 
concurrent collections, 958-961 
Parallel class, 940-946 
PFX benefits, 924-926 
PLINQ (see PLINQ) 
task parallelism, 946-956 
Parallel.For, 941-946 
optimization with local values, 
945-946 


ParallelLoopState, 943 
Parallel. ForEach, 941-946 
indexed, 942 
optimization with local values, 
945-946 
outer versus inner loops, 942 
ParallelLoopState, 943 
Parallel.Invoke, 940 
ParallelLoopState, 943 
parameterless constructor, 115 
parameterless constructor constraint, 141 
parameters, 53-64 
implications of passing arguments by 
reference, 59 
in modifier, 60 
named arguments and, 62 
optional parameters, 61 
out modifier, 58 
out variables and discards, 59 
params modifier, 60 
pass-by-value versus pass-by- 
reference, 93 
passing arguments by value, 56 
ref modifier, 57 
params modifier, 60 
parentheses (), 26, 65 
Parse method, 258, 270 
partial methods, 105 
partial types, 105 
passing by reference 
implications of, 59 
ref modifier, 57 
passwords 
hashing, 870 
validation, 1009 
Path class, 671-673 
pattern variable, 14, 110 
patterns, 2, 201-204 
constant pattern, 204 
positional patterns, 203 
property patterns, 201 
tuple patterns, 203 
var pattern, 204 
PE (portable executable) assembly, 757 
performance counters, 564-568 
creating counters and writing perfor- 
mance data, 568 
enumerating the available counters, 
565 
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reading data from, 566 
period (.), 26, 65 
PFX (see Parallel Framework) 
PipeStream class, 648-652 
anonymous pipes, 650-652 
named pipes, 648-650 
PLINQ (Parallel LINQ), 927-940 
canceling a query, 933 
custom aggregation optimization, 
938-940 
functional purity, 932 
functionality, 925 
input-side optimization, 935-937 
limitations, 930 
optimizing, 934-940 
ordering, 929 
output-side optimization, 934 
parallel execution ballistics, 929 
parallel spellchecker example, 930-932 
setting the degree of parallelism, 933 
when to use, 932 
plug-in methods, writing with delegates, 
150 
plus sign (+) 
addition operator, 39 
combining delegate instances, 151 
in nested type names, 800 
in regular expressions, 1002 
string concatenation operator, 47 
pointer-to-member operator (->), 220, 
221 
pointers, 219-223 
basics, 220 
fixed statement and, 220 
fixed-size buffers and, 222 
stackalloc keyword and, 221 
type names, 801 
to unmanaged code, 223 
void pointer (void*), 222 
polymorphism, 107 
POP3 mail, 720-722 
portable executable (PE) assembly, 757 
ports, TCP/UDP protocols, 690 
positional patterns, 203 
positive infinity (+ee), 41 
positive lookahead, 1003 
positive lookbehind, 1004 
post-phase action, 911 
#pragma warning directive, 225 


precedence, operator, 66 
predefined types, 28, 35, 55 
(see also specific types) 
predicate, 374 
preempted thread, 577 
prefixes 
namespaces and, 492 
X-DOM, 495 
preprocessor directives, 223-225, 1027 
Conditional attribute, 225 
pragma warning, 225 
primary expressions, 65 
primitive types, 35 
Priority property, 587 
private access modifier, 123 
private key, 876 
private protected access modifier, 123 
Process, 311-313, 559, 587 
examining running processes, 559 
examining threads in a process, 560 
Process. Threads property, 560 
producer/consumer collection, 959 
producer/consumer queue, 962-964 
Progress<T>, 627 
projecting 
LINQ operators, 429-441 
Select method, 430-434 
SelectMany, 435-441 
into an X-DOM, 497-500 
properties, 2, 99-101 
automatic, 100 
calculated properties, 100 
CLR property implementation, 101 
emitting, 835-837 
expression-bodied, 100 
get and set accessors, 101 
property initializers, 101 
read-only, 100 
property initializer, 101 
property patterns, 201 
protected internal access modifier, 123 
proxy servers, 701 
pseudocustom attributes, 819 
public access modifier, 123 
public key, 876 
public keyword, 30 
public-key encryption/signing, 876-879 
digital signing, 878 
RSA class, 877 
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punctuators, 26 


Q 


quantifiers, 377 
greedy versus lazy, 1003 
LINQ, 466 
regular expressions, 997, 1002 
query continuation, 393 
query expressions, 2, 378-382, 416-420 
about, 378-380 
building, 416-420 
delegates versus expression trees, 
416-420 
expression DOM, 418 
expression trees, 416-420 
mixed-syntax queries, 381 
query syntax versus fluent syntax, 381 
query syntax versus SQL syntax, 381 
range variables, 380 
query operators, 369 
query strings, 707 
querying, X-DOM, 476-481 
question mark (?) 
in nullable types, 185 
in regular expressions, 997, 1003 
Queue, 340 
Queue<T>, 340 
quote, single (') 
enclosing char literals, 45 
following generic type names, 800 


R 


rainbow tables, 870 
Random class, 290 
Range operator, 468 
range partitioning, 935, 946 
range variables, 380 
ranges 
arrays and, 50 
using indices and ranges with index- 
ers, 103 
RCWs (Runtime-Callable Wrappers), 990 
Reactive Extensions, 923 
read locks, 898-903 
read-only properties, 100 
read-only structs, 121 
ReaderWriterLockSlim, 898-900 


readonly modifier, 90, 121 
ReadOnlyCollection<T>, 356 
ReadOnlySpan<char>, 970-972 
real literals, 37 
rectangular arrays, 51 
recursive locking, 902 
reentrancy, 609 
ref locals, 63 
ref parameter 
implicit, 992 
passing, 811 
type names, 801 
ref parameter modifier, 57 
ref returns, 63 
ref structs, 122 
refactoring, 22 
reference assemblies, 235, 1034 
reference conversions, 108-110, 146 
reference types, 33 
referential equality, 296, 298 
reflection, 797-849 
anonymously calling members of 
generic interface, 815 
awkward emission targets, 840-844 
base types and interfaces, 801 
dynamic code generation, 823-830 
emitting assemblies and types, 
830-833 
emitting constructors, 837 
emitting fields and properties, 835-837 
emitting generic methods and types, 
838-840 
emitting type members, 833-838 
obtaining a type, 798-799 
parsing IL, 844-849 
reflecting and activating types, 
798-804 
reflecting and invoking members of a 
type, 805-817 
reflecting assemblies, 817 
type names, 800 
working with attributes, 818-823 
Reflection.Emit object model, 831 
Regex... 
Regex object, 999 
Regex.Match, 998 
RegEx.Replace, 1007 
Regex.Split, 1008 
RegexMatchTimeoutException, 998 
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RegexOptions, 999 
RegexOptions.Compiled, 999 
RegexOptions. Multiline, 1005 
Register WaitForSingleObject, 908 
regular expressions (Regex), 997-1014 
basics, 997-1002 
character escapes, 1000 
character sets, 1001 
compiled regular expressions, 999 
cookbook regular expressions, 
1009-1012 
groups, 1006-1007 
language reference, 1012-1014 
MatchEvaluator delegate, 1008 
named groups, 1007 
quantifiers, 1002 
RegexOptions, 999 
replacing and splitting text, 1007 
zero-width assertions, 1003-1006 
reimplementing interfaces, 127-129 
relational operators, 187 
remainder operator (%), 39 
Repeat operator, 468 
Replace method, 247 
reserved keywords, 25 
Resize method, 335 
ResourceManager class, 771-775 
resources files, 770-772 
resources, in assemblies, 768-775 
creating a pack URI resource in Visual 
Studio, 772 
directly embedding, 769 
resources files, 770-772 
.tesx files, 771-772 
restore 
generic methods, 137 
generic types, 136 
resurrection, 534-536 
.tesx files, 771-772 
return statement, 80 
return types, 22, 92 
rich-client applications 
application frameworks, 239 
threading in, 588-589 
right-associative operators, 66 
Rijndael class, 871, 875 
roots, garbage collection and, 530 
Roslyn compiler, 1017-1043 
architecture, 1017 


compilations and semantic models, 
1033-1043 
creating a compilation, 1034-1035 
emitting an assembly, 1035 
querying the semantic model, 
1036-1041 
scripting, 1018 
syntax trees, 1018-1033 
workspaces, 1018 
Round method, 289 
rounding errors, 43 
rounding, real to integral conversions, 283 
RSA encryption algorithm, 877 
runtime assemblies, 1034 
runtime type checking, 118 
Runtime-Callable Wrappers (RCWs), 531, 
990 
RuntimeBinderException, 211, 212 


s) 


satellite assemblies, 773-775 
building, 773 
cultures and subcultures, 774 
testing, 774 
Visual Studio designer support, 774 
sbyte (numeric type), 41 
sealed modifier, 113, 127 
searching within strings, 246 
security (see cryptography and encryp- 
tion) 
seed factory function, 938 
seed value, 938 
Select method, 430-434 
indexed projection, 431 
projecting into concrete types, 433 
select subqueries and object hierar- 
chies, 431 
subqueries and joins in EF Core, 
432-433 
selection statements, 72 
changing the flow of execution with 
braces, 73 
else clause and, 73 
if statement, 72 
switch expressions, 77 
switch statements, 74-75 
switching on types, 75 
SelectMany, 435-441 
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in EF Core, 439-440 
joining with, 438 
multiple range variables, 436 
outer joins with, 440 
thinking in query syntax, 437 
semantic model 
declared symbols, 1039 
looking up symbols, 1040 
symbol accessibility, 1038 
SymbolInfo, 1038 
symbols, 1037-1039 
for syntax tree, 1036-1041 
Typelnfo, 1039 
semaphore, 896-898 
asynchronous semaphores and locks, 
897 
writing an EnterAsync extension 
method, 898 
semicolon (;), 22, 26 
SendAsync method, 698 
SequenceEqual method, 467 
sequences, 369 
serialization, 725-754 
attribute-based, 729-731 
binary serializer, 747-749 
concepts, 725-729 
defined, 204 
engine comparison, 726-728 
explicit versus implicit, 729 
formatters, 728 
JSON serializer, 738-747 
XML serializer, 729-738 
serialization engines, 726-728 
binary serializer, 728 
data contract serializer, 728 
IXmlSerializable hook, 728 
JsonSerializer, 727 
XmlSerializer, 727 
set accessor, 101 
set operators (LINQ), 456 
Concat and Union, 456 
Intersect and Except, 457 
SetData method, 916 
SetValue method, 330, 485 
shared memory, 982-984 
(see also memory-mapped files) 
shared state, 576 
shared writable state, 582 
shift left operator (<<), 41 


shift right operator (>>), 41 
SignalAndWait method, 909 
signaling 
event wait handles for, 903-910 
threading, 587 
signature, 92 
signing, digital, 878 
signtool.exe, 767 
single dispatch, 857 
Single operator, 461 
single-line comments, 27 
single-threaded program, 576 
single-threaded timers, 920 
Skip operator, 428 
SkipWhile operator, 429 
slicing 
defined, 965 
spans and, 966-969 
SmtpClient, 716 
SortedDictionary<TKey,T Value>, 349 
SortedSet<T>, 342 
Span<T> struct, 965-969 
spans 
CopyTo and TryCopyTo, 967 
forward-only enumerators, 970-972 
slicing and, 966-969 
stack-allocated/unmanaged memory, 
O72. 
working with text, 968 
spinning, blocking versus, 579 
Split method, 248 
splitting strings, 248 
square brackets ([]) 
array declaration, 23, 48, 51 
in regular expressions, 1000 
stack, 53 
Stack, 341 
stack-allocated memory, 972 
Stack<T>, 341 
stackalloc keyword, 221 
StackFrame class, 560-562 
StackTrace class, 560-562 
StartsWith method, 246 
state, 864 
statement block, 22, 70, 166 
statements, 21, 70-81 
declaration statements, 71 
expression statements, 71 
iteration statements, 77-79 
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jump statements, 79-81 
selection statements, 72 
switch statements, 74-75 
static binding, 208 
static classes, 104, 193-195 
static constructors, 103 
static local methods, 93 
static members 
instance versus, 29 
locking, 892 
static methods, 193-195 
static readonly field, 91 
static type checking, 118 
static types, 214 
static typing, 2 
static variable flags, 552 
Stop method, 525 
Stopwatch class, 569 
StorageFile class, 676-677 
StorageFolder class, 676-677 
stream adapters, 653-661 
binary adapters, 659 
closing and disposing, 660 
text adapters, 654-658 
Stream... 
Stream class, 639 
StreamReader, 656, 719 
StreamSocket class, 722 
StreamSocketListener class, 722 
StreamWriter, 656, 719 
streams and I/O, 637-686 
backing store streams, 643 
BufferedStream, 652 
closing and flushing, 642 
compression streams, 661-664 
file and directory operations, 665-675 
file I/O in UWP, 676-680 
FileStream, 644-647 
memory-mapped files, 683-686 
MemoryStream, 647 
OS security, 680-683 
PipeStream class, 648-652 
reading and writing, 641 
seeking, 642 
stream adapters, 653-661 
stream architecture, 637-639 
thread safety, 643 
timeouts, 643 
using streams, 639-653 


ZIP files, 664 
string type, 27, 46-48, 245-256 
accessing characters within, 246 
comparing, 249-252 
constructing strings, 245 
enum conversions, 294 
manipulating, 247 
null/empty, 246 
ordinal versus culture comparison, 
250) 
searching within strings, 246 
spans and, 968 
splitting/joining, 248 
string equality comparisons, 250 
string order comparison, 251 
String.format and composite format 
strings, 248 
StringBuilder class, 252 
text encodings and Unicode, 253-256 
String... 
string.Format, 248, 273 
string. IsNullOrEmpty, 246 
StringBuilder class, 236, 252 
StringComparer, 365 
StringComparison enum, 251 
StringInfo class, 256 
StringReader, 658 
StringSplitOptions enum, 248 
String Writer, 658 
strongly named assemblies, 762 
strongly typed language, 2 
struct constraint, 141 
structs, 120-122 
construction semantics, 120 
mapping to unmanaged memory, 
984-988 
mapping to unmanaged method, 
978-980 
read-only structs/functions, 121 
ref structs, 122 
speeding up equality comparison with, 
302 
structural comparison, 366 
structural equality, 296 
structured parallelism, 924 
subclass 
reimplementing an interface in, 
127-129 
subclassed collection elements, 736 
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subclasses and child objects, 731-734 
subclassing child objects, 733 
subclassing generic types, 141 
subclassing serializable classes, 753 
subclassing the root type, 731 
subcultures, 774 
subqueries, 388-391 
deferred execution and, 391 
select subqueries and object hierar- 
chies, 431 
subqueries and joins in EF Core, 
432-433 
subscribers, 158 
Substring method, 247 
Sum operator, 463 
surrogate pairs, 255 
switch expressions, 77 
switch statements, 74-75, 204 
SymbollInfo, 1038 
symbols 
renaming a symbol, 1041-1043 
syntax trees and, 1037-1038 
symmetric encryption, 871-876 
chaining encryption streams, 873 
disposing encryption objects, 875 
encrypting in memory, 872 
key management, 875 
synchronization context scheduler, 955 
synchronization object, 885 
SynchronizationContext class, 589 
synchronous call graph, 602 
synchronous completion, 621-623 
syntax trees, Roslyn, 1018-1033 
common properties and methods, 
1021 
CSharpSyntaxRewriter, 1032 
CSharpSyntax Walker, 1026 
declared symbols, 1039 
finding a child by its offset, 1025 
handling changes to source code, 1030 
looking up symbols, 1040 
node types, 1020 
obtaining, 1022 
preprocessor directives, 1027 
renaming a symbol, 1041-1043 
structure, 1019-1021 
structured trivia, 1028 
symbol accessibility, 1038 
SymbolInfo, 1038 


symbols, 1037-1038 
SyntaxFactory and, 1030-1032 
transforming, 1029-1033 
traversing children, 1023-1025 
traversing parents, 1025 
traversing/searching a tree, 1023-1026 
trivia, 1027-1029 
Typelnfo, 1039 
working with TextSpan, 1025 
syntax, C#, 24-27 
comments, 27 
identifiers and keywords, 25-26 
literals, punctuators, and operators, 26 
Syntax... 
SyntaxFactory, 1030-1032 
SyntaxFactory.Token, 1031 
SyntaxNode, 1019, 1023-1025, 1029 
SyntaxToken, 1019 
SyntaxTree, 1025 
SyntaxTrivia, 1019, 1025, 1028 
System... 
System.AppContext, 313 
System.ArgumentException, 177 
System.ArgumentNullException, 178 
System.ArgumentOutOfRangeExcep- 
tion, 178 
System.Attribute, 205, 819, 820 
System.Buffers, 541 
System.Buffers.MemoryPool<T>, 970 
System.Buffers.Text, 969 
System.Char, 243 
System.Collections.*, 749 
System.Collections.Concurrent, 535, 
958 
System.Collections.Generic, 180 
System.Collections.Generic.[Enumer- 
able<T>, 179 
System.Collections.Generic.[Enumer- 
ator<T>, 179 
System.Collections.[Enumerable, 179, 
180 
System.Collections.[Enumerator, 179 
System.Collections.Immutable, 357 
System.ComponentModel, 589, 634 
System.Core.dll, 852 
System.Data, 526 
System.Data.IDataRecord, 862 
System.Data.SqlTypes.SqlBoolean, 219 
System.Delegate, 152 
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System.Diagnostics, 311-313, 558, 587 
System.Diagnostics.Performance- 
Counter, 564 
System.Drawing, 526 
System.Dynamic, 851 
System.Enum, 291 
System.Environment class, 310 
System.EventArgs, 160 
System.EventHandler<>, 161 
System.Exception, 172, 177 
System.GC.Collect, 536 
System.Globalization, 256, 260 
System.Globalization.CultureInfo, 775 
System.IDisposable, 174 
System.InvalidOperationException, 
178 
System.IO, 526, 665, 687 
System.1O.Compression, 661, 664 
System.Ling.Enumerable class, 812 
System.Ling.Expressions, 1018 
System.Management, 667 
System.MulticastDelegate, 152 
System.Net, 609, 687, 690 
System.Net.Mail, 716 
System.NotImplementedException, 
178 
System.NotSupportedException, 178 
System.Nullable<T>, 185 
System.Object, 116, 303 
System.ObjectDisposedException, 178 
System.Reflection, 761 
System.Reflection.Emit, 797, 823, 831 
System.Runtime, 570 
System.Runtime.CompilerServices, 
851 
System.Runtime.InteropServices, 994 
System.Runtime.Loader, 776 
System.Runtime.Serialization.Format- 
ters.Binary, 748 
System.Security.AccessControl, 682 
System.Security.Cryptography, 291 
System.Security.Cryptogra- 
phy.X509Certificates, 868, 879 
System.Security.Cryptography.Xml, 
868 
System.String, 245, 307 
System.Text, 236, 254, 1030 
System.Text.Encodings.Web. JavaS- 
criptEncoder, 747 


T 


System.Text.Json, 738 

System.Text.Json.JsonDocument, 519 

System.Text.Json. Serialization, 741 

System. Text.Json.Utf8JsonReader, 
516-518 

System. Text.Json.Utf8Json Writer, 518 

System.Text.RegularExpressions, 236, 
997 

System.Threading, 544 

System.Threading.Channels.Channel, 
923 

System.Threading.Tasks, 593 

System.Threading.Timer, 919 

System.Timers, 543, 919 

System.Timers.Timer, 540 

System.Tuple, 200 

System.Type, 118, 798 

System. Uri class, 202 

System.WeakReference class, 545 

System.Windows.Forms.Timer, 920 

System.Windows.Threading.Dispatch- 
erTimer, 920 

System.Xml.Lingq, 483, 750 

System.Xml Serialization, 205, 729 


Take operator, 428 
TakeWhile operator, 429 
TAP (Task-Based Asynchronous Pattern), 


629 


task combinators, 629-633 


custom combinators, 631 
WhenAll, 630 
WhenAny, 629 


task parallelism, 946-956 


canceling tasks, 949 

child tasks, 948 

continuations, 950-954 
creating and starting tasks, 947 
defined, 924 

specifying a state object, 947 
task schedulers, 955 
TaskCreationOptions, 948 
TaskFactory, 955 

waiting on multiple tasks, 949 


task schedulers, 955 
Task-Based Asynchronous Pattern (TAP), 


629 
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Task... 
Task class, 592 
(see also tasks) 
Task.Delay, 600 
Task.Factory, 955 
Task.Factory.StartNew, 599, 947 
Task.Run, 593, 602, 947 
Task.WaitAll, 949 
Task. WaitAny, 949 
Task.WhenAll, 630 
Task.WhenAny, 629 
Task<T>, 617 
Task<TResult>, 594, 612, 951 
TaskCanceledException, 950 
TaskCompletionSource, 598-600, 604, 
612, 614 
TaskCreationOptions, 948 
TaskCreationOptions.LongRunning, 
594, 599 
TaskFactory object, 955 
TaskScheduler.UnobservedTaskExcep- 
tion, 596 
tasks, 592-600 
asynchronous programming and, 602 
continuations, 596-597 
exceptions, 595 
exceptions and autonomous tasks, 595 
long-running, 594 
returning values, 594 
starting a task, 593-594 
TaskCompletionSource, 598-600 
Wait method, 593 
TCP (Transmission and Control Protcol) 
basics, 717-720 
concurrency with, 719 
receiving POP3 mail with, 720-722 
in UWP, 722-723 
TcpClient, 717-720 
TcpListener, 717-720 
text 
MatchEvaluator delegate, 1008 
replacing and splitting with regular 
expressions, 1007 
spans and, 968 
text adapters, 654-658 
character encodings, 657-658 
StreamReader and StreamWriter, 656 
StringReader and StringWriter, 658 
text encoding, 253-256 


encoding to byte arrays, 255 
file and stream I/O, 255 
obtaining an Encoding object, 254 
UTF-16 and surrogate pairs, 255 
text handling, 243-245 
char type, 243 
text encodings and Unicode, 253-256 
TextSpan, 1025 
ThenBy operator, 451 
ThenByDescending operator, 451 
thin-client applications, 239 
this keyword, 94 
this reference, 98 
thread execution barrier, 910 
thread pool, 590 
entering, 591 
hygiene in, 591 
thread safety, 643 
thread-local storage, 914 
AsyncLocal<T>, 917 
GetData and SetData, 916 
ThreadLocal<T>, 915 
ThreadStatic attribute, 915 
thread-safe code, 582 
thread-safe objects, 892 
thread-unsafe operations, 932 
Thread... 
Thread object, 576 
Thread.Sleep, 578 
ThreadLocal<T>, 915, 931 
ThreadPool.Register WaitForSingleOb- 
ject, 908 
ThreadStart delegate, 576 
ThreadStatic attribute, 915 
threading, 576-592, 881-921 
advanced topics, 881-921 
Barrier class, 910-911 
blocking, 578 
blocking versus spinning, 579 
creating a thread, 576-578 
exception handling, 584 
exclusive locking, 882-890 
foreground versus background 
threads, 586 
1/O bound versus compute-bound 
operations, 579 
join and sleep, 578 
lambda expressions and captured vari- 
ables, 584 
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lazy initialization, 912-914 
limitations of, 592 
local versus shared state, 580-582 
locking and thread safety, 582, 
890-895 
multiple UI threads, 589 
nonexclusive locking, 896-903 
passing data to a thread, 583-584 
Priority property, 587 
in rich-client applications, 588-589 
signaling, 587 
signaling with event wait handles, 
903-910 
synchronization contexts, 589 
synchronization overview, 882 
thread pool, 590 
thread-local storage, 914 
timers, 918-921 
threads 
defined, 576 
examining threads in a process, 560 
throw expressions, 16, 176 
throw statement, 81 
throwing exceptions, 175-177 
rethrowing exceptions, 176 
throw expressions, 176 
tiered compilation, 232 
time zones, 264-269 
DateTime and, 264 
DateTimeOffset and, 265 
Daylight Saving Time and DateTime, 
268 
TimeZone and TimeZonelnfo, 
265-268 
time-stamping server, 768 
timeouts, 643 
timers, 918-921 
memory leaks and, 543 
multithreaded, 919-920 
single-threaded, 920 
TimeSpan, 256 
TimeZone, 265-268 
TimeZone.CurrentTimeZone method, 
265 
TimeZonelnfo, 265-268 
To... 
ToArray operator, 459 
ToDictionary operator, 459 
ToHashSet operator, 459 


ToList operator, 459 
ToLocalTime, 264 
ToLongDateString method, 263 
ToLookup operator, 459 
ToLower, 244, 248 
ToShortDateString method, 263 
ToString method, 119, 252, 263, 270, 
473 

ToUniversalTime, 264, 269 
ToUpper, 244, 248 

Trace class (see Debug and Trace classes) 

TraceFilter, 557 

TraceListener, 556 

traces, 571 

transport layer, 687 

trivia 
preprocessor directives, 1027 
structured, 1028 
syntax trees and, 1027-1029 
unstructured, 1028 

try statements and exceptions, 170-179 
alternatives to exceptions, 179 
catch clause, 172-173 
common exception types, 177 
finally block, 174 
key properties of System.Exception, 

177 

throwing exceptions, 175-177 
try/catch/finally blocks, 183 
TryXXX method pattern, 178 
using declarations, 175 
using statement, 174 

TryCopyTo method, 967 

TryEnter method, 884 

TryParse, 258, 270 

TryXXX method pattern, 178 

tuple literal, 197 

tuple patterns, 203 

tuple types, 197 

TupleElementNamesAttribute, 199 

tuples, 197-200 
deconstructing, 199 
equality comparison, 200 
in C# 7, 16 
naming tuple elements, 198 
System.Tuple classes, 200 
type erasure, 198 
ValueTuple.Create, 199 

type arguments, 136 
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type checking, 118 
type converters, 285 
type equivalence, 994 
type library importer, 990 
type marshaling, 976-980 
in and out marshaling, 980 
marshaling classes and structs, 
978-980 
marshaling common types, 976-978 
type members, emitting, 833-838 
type parameters 
covariance, 145 
declaring, 138 
generic types and, 136 
type patterns, 14 
type safety, 2, 133 
type system, C# 
access modifiers, 123-124 
anonymous types, 195 
basics, 27-35 
Boolean type and operators, 43-45 
C# members versus CLR members, 
808 
converting types, 30 
creating types, 89-148 
custom type examples, 28-30 
dynamically invoking a member, 810 
emitting assemblies and types, 
830-833 
enums, 131-134 
extension methods, 193-195 
generics, 135-148 
inheritance (see inheritance) 
instantiating a type, 802-803 
interfaces, 125-130 
nested types, 134-135 
numeric types, 36-43 
object type, 116-120 
predefined type examples, 27 
strings and characters, 45-48 
structs, 120-122 
value types versus reference types, 
S1=35 
type unification, 116, 292 
Type... 
TypeAttributes, 831 
TypeBuilder, 831, 832, 835 
TypeldentifierAttribute, 994 
Typelnfo class, 799, 805, 1039 


typeof operator, 118, 139 

types 
aliasing within namespaces, 85 
base types and interfaces, 801 
partial types/methods, 105 
reflecting and activating, 798-804 
reflecting and invoking members of, 

805-817 


U 


UAC (User Account Control), 680 
UI (see user interface) 
unbound generic type, 139 
unboxing, 117 
is operator and, 110 
nullable values, 186 
#undef directive, 552 
Unicode, 253-256 
UTF-16 and surrogate pairs, 255 
XmlWriter and, 490 
UnicodeCategory enum, 245 
unified type system, 1 
union, 981 
Union operator, 456 
Universal Windows Platform (UWP) (see 
UWP) 
Unix gzip file compression, 663 
Unix, OS security, 680 
unmanaged code 
callbacks from, 980 
pointers to, 223 
unmanaged constraint, 141 
unmanaged heap, 985 
unmanaged memory, 972, 984-988 
UnmanagedType, 976 
unnamed methods, 616 
/unsafe compiler option, 219 
unsafe code, 219-223 
unseeded aggregations, 464 
upcasting, 108 
upgradeable locks, 901-903 
UploadValues method, 707 
Uri class, 690-692 
URIs, 690-692 
User Account Control (UAC), 680 
user interface (UI) 
awaiting in, 608-610 
multiple UI threads, 589 
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UseShellExecute, 311 

ushort (numeric type), 35 

using declarations, 175 

using directive, 82, 85 

using statement, 81 

using static directive, 83 

UTC (Coordinated Universal Time), 258, 
265 

Utf8JsonReader, 516-518 

Utf8JsonWriter, 518 

utility classes, 309-313 
AppContext, 313 
Console, 309 
Environment, 310 
Process, 311-313 

UWP (Universal Windows Platform), 241 
about, 5 
application folder, 678 
downloads folder, 679 
file I/O in, 676-680 
isolated storage, 678 
KnownFolders class, 678 
obtaining directories and files, 677 
removable devices, 679 
TCP in, 722-723 
user-selected files and folders, 679 
working with directories, 676 
working with files, 677 


V 


value equality, 296 
value types, 32 
ValueTask<T>, 617, 623 
ValueTuple.Create, 199 
ValueTuple<string,int>, 199-200 
var keyword, 64 
var pattern, 204 
var type, dynamic type versus, 212 
variables, 53-64 

(see also parameters) 

default values, 55 

definite assignment and, 54 

heap, 53 

purpose of, 27 

ref locals, 63 

ref returns, 63 

stack and, 53 

var keyword, 64 


verbatim string literals, 47 
vertical bar (|) 
bitwise OR operator, 41, 132 
in regular expressions, 998 
view accessors, 685 
virtual function members, 111 
virtualization, 682 
Visitor pattern, 854-857 
void expressions, 65 
void pointer (void*), 222 
volume information, querying, 674 


W 


wait handles (see event wait handles) 
Wait method, 593 
WaitAll method, 909 
WaitAny method, 909 
#warning preprocessor directive, 224 
WCE (Windows Communication Foun- 
dation), 727 
weak references 
caching and, 546 
events and, 547-549 
GC and, 545-549 
WebClient, 693-694 
custom headers, 706 
uploading form data, 707 
WebException, 704 
WebExceptionStatus enum, 704 
WebRequest, 695-696 
custom headers, 706 
uploading form data, 708 
WebRequest.RegisterP refix, 696 
WebRequestMethods.Ftp, 713 
WebResponse, 695-696 
Where clause, 426-428 
Enumerable.Where implementation, 
426 
indexed filtering, 427 
WHERE clause, 428 
while loops, 77 
wildcards (character sets), 1001 
Windows 
application manifest, 759 
memory-mapped files and shared 
memory, 684 
OS security, 680 
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Windows Communication Foundation 
(WCF), 727 

Windows Data Protection API (DPAPI), 
868 

Windows Desktop application frame- 
work, 240 

Windows event logs, 562-564 
monitoring, 564 
reading, 563 
writing to, 562 

Windows Forms, 241 

Windows Management Instrumentation 
(WMI) API, 667 

Windows Runtime (WinRT) 
asynchronous methods in, 619 
event semantics in, 160 
garbage collection and, 531 
interoperability with C#, 7 

Windows.Networking.Sockets, 722 

WithDegreeOfParallelism, 933 

WithMergeOptions, 929 

WMI (Windows Management Instru- 
mentation) API, 667 

word boundary assertions, 1005 

WPF, 240 

write locks, 898-903 


X 


x++ (incrementing), 582 

Xamarin, 5, 242 

XAML (Extensible Application Markup 
Language), 285 

XAML files, 772 

XAttribute, 474 

XContainer, 472 

XDeclaration object, 487 

XDocument, 472, 487-489 

XElement, 470-474 
using XmlReader with, 514 
using XmlWriter with, 515 

XML declarations, 489 

XML documentation, 226-229 
standard tags, 226-228 
type or member cross-references, 229 
user-defined tags, 228 

XML DOM (X-DOM), 470 
attribute navigation, 481 
automatic deep cloning, 476 


automatic XText concatenation, 487 

child node navigation, 477 

content specification, 475 

default namespaces, 494 

functional construction, 474 

getting values, 485 

instantiating, 474-476 

loading and parsing, 472 

mixing XmlReader/XmlWriter with, 
514 

namespace specification, 493 

navigating and querying, 476-481 

overview, 470-474 

parent navigation, 480 

peer node navigation, 481 

prefixes, 495 

projecting into, 497-500 

removing a sequence of nodes or 
attributes, 483 

retrieving a single element, 479 

retrieving descendants, 479 

retrieving elements, 478 

saving and serializing, 473 

setting values, 485 

simple value updates, 482 

updating, 481-484 

updating child nodes and attributes, 
482 

updating through the parent, 483 

values and mixed content nodes, 486 

working with values, 484-487 


XML serializer, 729-738 
XmlConvert, 284 
XmlReader, 501-509 


mixing with an X-DOM, 514 

namespaces and prefixes, 508 

patterns for using, 511-516 

reading attributes, 507 

reading elements, 503-507 

reading nodes, 502 

using with XElement, 514 

working with hierarchical data, 
511-514 


XmlSerializer, 729-738 


attribute-based serialization, 729-731 

attributes, names, and namespaces, 
730 

basics, 727 

IXmlSerializable, 736-738 
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serializing child objects, 732 
serializing collections, 734-736 
subclassed collection elements, 736 
subclasses and child objects, 731-734 
subclassing child objects, 733 
subclassing the root type, 731 
XML element order, 731 
XmlWriter, 509 
mixing with an X-DOM, 514 
patterns for using, 511-516 
using with XElement, 515 
working with hierarchical data, 
511-514 
writing a declaration to a string, 490 
XNode, 472 
XObject, 470, 496 
XStreamingElement, 500 


XText, 487 


” 


yield break statement, 182 


Z 


zero-width assertions, 1003-1006 
anchors, 1004 
defined, 1003 
lookahead and lookbehind, 1003 
word boundaries, 1005 

ZIP files, 664 

Zip operator, 449 

Zip Archive class, 664 

Zip File class, 664 
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ing from Woods Illustrated Natural History. The cover fonts are Gilroy Semibold 
and Guardian Sans. The text font is Adobe Minion Pro; the heading font is Adobe 
Myriad Condensed; and the code font is Dalton Maag’s Ubuntu Mono. 
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